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162001 AAGGCCAGCC TGATGGCACT GATGTACATC TAAAAGAAAC ATTACTTTAT CTTCCCATGC 

162061 TTCCTTACCA TTCTCCTTTA ATAGCACTAT AACATACCTT TTTTCCCTAC TCCAAGTACA 

162121 CAGCCTCACC TGCAGCAATT TCTGGGCTGA GCCCTGACAT TTTTCCTCCA GTTCCAGGAT 

162181 GTGGCTCTTG AGTTCATTGC TCTTCAGCCC CAGACCAGCC TCATAGTCCC TCAGTCTACT 

162241 CAGAGTCTGT TGTTCTTCTT TCTCCAGCCT CCAGAGATAA GACTTCTCTT CCTCATGTAG 

162301 GAAACACTGG AGATTCTTAA AGTCAGACCG GATTTTTTGT CTCTGAATCT GTACCTTCTC 

162361 CTGGAGTCAA GAAAGTATGG TCAAAAGGTG GAAGTAAACC AAATGTCCAT CTATGGATGA 

162421 ATGGATAAAC AAGAATGAAA GTCTGACACA CGCTACTACA TGACAAGCCT TGAAGACATT 

162481 CAAGCAAAAT AAGCCAGAAA CAAAAGGGCA AATATTGTAA GACTTTGCTT ATACAAGGCA 

162541 TCTGGAGTAG TTAAGTTCAT AGAGACAGAA AGTAAAATAG TGGTTACAAG GTGTTGGCAA 

1626 01 GACCAGAAAA TGGACAGTTA TTGTTTAATG GGTAGTGAGT TTCAGTTTAG AAGATGAAAG 

162661 ATGAAACTGA GTTGCAGTTT GGAGATGGGA ATGGTGATGG TTGCACAACA ATGTAACAAT 

162721 GTAAAAGCAC TTAATTCTAC TGAACTATAT ACTTAAAAGT GGTTAAATGC TTAAGTGTTA 

162781 TATATATTTT CACACAAACA CACACACACA CACAATCAGC CACTGGGACA TTATTTTCTC 

162841 ATGAGTCACT GAAGCTGGAA GAATGTCCCC AGTTTCCTGC TGCAGAGTCA TGTGTGGGAG 

162 901 GCAGGCACTC AGATGTGGAA GAGGTTGCCT CAGATTCCTT ATAGTCACCC AATTAATTTT 
162961 CTTGTTCTTC AGCCAAGACA CAGGAGAAAG CTGGGTTAGG AGTGCTAGAT AATTTAATTG 
163021 TGAAACTAGG GCCAAGTTCA AACACTTTAT CAGTTACAAG GATAAAAAGA GGTTTTTACT 
163 OBI TATGATTTAA GAAGTTAGAT TTCTGAGTTG GAGCGATTTT CTTGAAGTAA AAGCTTATAA 
163141 TGAACATCAC CCAGACTGGA TTTTAAGACA ACCAGGCTGG TAAGAGGGTC CATAATTCTT 
163201 GGCAGGGGGA GCTTTGAGTG TGACAGGCAT TTATTATGGT TAACTGAGAA ATACTGTTCT 
163261 ACTACCCTAG GGTCATCTTA AGCATTCCTA TGTGTAAGAC TGACAGAAAT CAAGTGAAAC 
163321 TCTCATCTGA GGAGATGTAA AGTTGCAATT TCCATTAGTG CTGTCTAAAT TAATGCAGTG 
163381 GGAGTGTGTA TTCAGGGCAA TTTGAATCTA TGTTCTTGGA TTGCAGTCTT CAAACTTGGC 
163441 CCAAATAAAC TCTCTACTTA TCTTAAAAAA ATAAAAATTA AAAAATAAAA ATAAATTCAT 
163501 ACAGTGTTTT GATGACTATG ATATAGAAGA AGGGTCTTTG ACTTAGGATG AGGTGGAATT 
163561 TTTGTGTAGG AGACAGGTGC AGCTTTAACT CTTGTATAGA CGGGTTTTCA TATATGTTAG 
163621 TTACAATCAA GGTCTTCCCC ATTGCCCAAG ATCCTAGAAA TGGGGGAAGT AAGAGTGTAC 
163681 TCAGGAGCTC AAGAGCAACA TCCACAAACA AAGATCAGGG TAGAGGTTAG AGAGGACTCC 
163741 TGAAAGAGAG AAAATTGGTA ATCAGCTTGT GGGATTTTAC TGCAAGCTAG TGAATTATAT 
163801 AAATATAAAG ATTGGTGCAA AAGTAATTGT GGTTTTTGCC TTTACTTTAA TGGCAAAGAC 
163861 CGCAATTACT TTTGCACAAA CCTAAATATT TCCATAAAAG AATGTGGCTC TGATAATGTG 
163921 GAGGTTAGTC AGCCACGGAA ATAATCTGAA AGTTTGTAGT TGCAAGTGTG TAGGTTGTTG 

163 981 CATTACTTGT GATGTACTTA TAAATCAAGT ATAGGCCGGG TGCAGTGGCT CACGCCTGTA 

164 041 ATCCCAGCAC TTTGGGAGGC TGAGGTGGGT GAATCACGAG GTCAGGAGAT CAAGACCATC 
164101 CTGGCCAACA TGGTGAAACC CCGTCTCTAC TAAAATACAA AAAATTAGCC AGGCATGGTA 
164161 GCACATGCCT GTAATCCCAG CTACTCAAGA GGCTGAGGCA GGGGAATTGC TTGAACCCGG 
164221 GAGGTGGACA TTGCAGTGAG CTGAGATCGC ACCACTACAC TCCAGCAAGA CTCCATCTCA 
164281 AAAAATAGTA ATAATTTAAA AATAAATAAA TAAATAAAGT ATATTTCTTT CATCAGCTTC 
164341 ATGAGCTAGA GTAGTATGAA TTTCAATCTG GAGTGATCCT GTTTTCTAAG TGTTCACAAA 
164401 GCTTGGTTTC TGTACCTGTA AAGTTGAGAG CCAGATGCTC CACTGTGGTA AAAGTGCCAG 
164461 GGTAATGAGT TGAGGCCTGC AAACCAGGTT TATTTTGACG TATTTAAAGT TTGAGACCCA 
164521 CTCGATGCTT TTTCTAGGTA AATAGTCATA CTAATTCTGC TTCTTCTGAC TGAAGTATCA 
1645 Bl GGAATCCCAG CCAACTACAG TTTAAAGATG GAAAGATTGG TGCTAAATAC TCATGGATGT 
164641 AAACCTGGAA CCAGGGGCAT AAGTACAAAT AATGGTTTCT TCCTTGGGTT TCATTTTTTC 
164701 AATCTGGTTT AGTGAGAATA AATCCTCATT GTGCTTTTCC TCAATCATCC CCTATGCCTA 
164 761 AGCTCTAGAA TGGAAAATAG CTTGAGATCA ATGAAGTCAG ATTCTTACTT TCCATTTAGT 
164821 TATTCGCATT GCTGTGGACA GCTTCTGCTC CGTACATCTG TCTTCAAGTT GCTTCAGTTT 
164881 TGTCACAGCT TTCTGGAGCT TTTCCTGAAG GAAAAATTTG ATAAGTGAAG CCTATTCAAT 
164 941 TTGACTCTTC ATTAGGGACC TAGGGGGAAT CCCAATCTTC TAAGATATAT TTGAATAATA 
165001 GTGAATATTT ATAGAGTCCT CATTGTTTTT TGCTAGAGAG CATGCTAAAG GCTATATGTG 
165061 CAGGAACATA CTGATCCCCT TGGCAACCCT GAATAGTTGG TAGGATTTTA AACTTCATTT 
165121 CTGTGCTGTA GAAAATGAGA CTAAGAAAGG GGTAAAATAA CTTGCCCAAA GGGCTATGAC 
165181 TGCCAGGTGG TGGAGCAACA ATTGCAATCT CATCTGCTGA CCCAGAGCCT GAGCTATGTC 
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165241 CACCACTAGA GTCCTGCCAG GAAAAAGTTG GATATAGAAC AAGGTAATCA TCATCTAAAA 

1653 01 GATTTTGTAA AACAACATGC TGAACCAAGC AAAACCAATA CCAGTGTTTG GCACACATGA 

165361 AATTTTGTGT CTTATGAGTC AGGAAAAATC AGGATGCCAG CTGGTTATTA GAAACAGTTC 

165421 ATGGAAGAGG GGAATTCTGG TATCTTTTGA ACAATGGTAT CATGAATCCA ATTTAAAATG 

165481 ATTTAGTATT CATGTCAAGC TTTTAGCTTA TTCTTCAAAA CAGTTTCTCA TATTTCTATT 

165541 GAAAGTGATT TGAAGCTGAC CCAAATTGCT AATTGTAGTC AATGCTGAAA GAATTGTCTC 

165601 CTGTCCTCTG TAAACCCAAC AAGTATACTC ATTCATTCTC GAGTGTTCTC AGGAAAAGGT 

16 5661 TCTATGTAAC TGTTTTAGCA AAAGATGACA TTGTCCTTAC TATATGCCAA GTGCTATTCT 

16 5721 ATGCATTCTA TATTTTAATG TCCTCAAAGC TTATAACCAC CTCCTGTGTA TGTGTTTTAG 

16 5781 GGAGGGAGGA CACTGCTATT ATCCCCATTT ACAGATGGAG AAACCAAGGT GTGAAGACAT 

165841 TAAGTAACGT GCCCAAAATT GCCCATCTAG TAAGTGACAA AACTCAATTT CAACATAAGC 

165901 TGGTTCCTTT TCTTACTACT TGGTGGAAAA GTAATTCAAA TGGGAATATG ATCATCGCAG 

165961 TTATTAGCTG CTCCATGGAG TTTAAGGAAG AGCTGCCATG AGCTGAGTGG TGGTCA7GAT 

166021 TGACATGTCC TTAGAAGGAC TTAGAGCCTT CATACAAGAC CACCTCTGCC TCATGGAGGA 

166081 CAGAATAAGG AGCCTGACAC TGGAGACAAC ATTTTCCTCA AATTTAGGCA GGACAGAGAA 

166141 GGAAAAAGGA CATCAGGACT ATGCCCATTC CTCCATGCTG CCAACAGCAA AGTCCCACCT 

166201 TCCTTAATAT GCTTTCTGGC AAGAAATCTG GATGGTACAC AAAACCTCTC CCTCTGCTTC 

166261 ACCTTCCACA ACCAAGCATT TCCAAATCTT TGACTCTTCT TCCTGAATCG TGCTTAAAAT 

166321 CTGCCCTCTC CTCCCTTTCT TATACGGATA GTTTGAATTT TACTCCTTGA TATTCCTTTT 

1663 81 ATCATAGACA TGCCACAGTA GCTGGGCACA GTGGTTCATG CCTCTAATCC CAGCATTTTG 

166441 GGAGGCTGAG ATGGGAGGGA GACCAGGGGT TTGAGGCCAG TATAAGCAAG AAAGG CAG AC 

166501 CATGTCTCTA CAAAAAATAA AAAAATTATC CAGGTATGGT GGGGCATCCC TGTAGTCCTA 

166561 GCTACTTGGG AGGCTGAGGT GGGAGGATTG CTTGAGCCCC AGAAGGTTGA GGCTGCAGTG 

i6 66 21 AGCCGAGATT GCACCATTGT ACTCCAACCT GGGATACAGA GCAAGACCCT ACCTCAGGAA 

16 66 81 AAAAAAAAAA AAAAAAAAAA AAAAGTAGAG GTACCAGAGT GATATTTTCA ATGTCACTGA 

166741 CCCTTCATTC CCCAAATGAA AATCCCCCAA TAGGTGTTCA ATTTTTACGT GTCCTTCAGG 

166801 AGTTACTTCT AAGATGAACC ACTCTCTACC CTAAATGTCC CTCCCCACCA CCAAAACCAG 

166861 GGACCTCCAG GCAGACATTT TTGATGGTTT GTTTTCTTTA CTAGACTGTA GATACCTAAA 

166921 AGGTGATGGG TCTTTCTTCC CTGTTTTCAG GCCCTACTGC ATGGCTTTAC ATATTGTGGT 

166981 TTTTCAAATG ATATTCATGG TGTGAAACAA GAAAAAATGC GGGTGTTTGG TTTGAGAACA 

167041 ACCTGTTCTA AAGCAAAAAG AAATTCATCA TAACACAAAT GGATAGAGAT AAGAGTCCAA 

167101 CCATCCCATT GAAGGTCAGG ATGGACAGTC TAGATAATTG AGCAAGAAAT CATCATAAAC 

167161 TATTTTTCAG AAGAATGACA TGATGAAAGC TGTATTTCCA AGTCATAATG TTAGGTTTCA 

167221 AGTTAAATCA TCTCAGCTCC TGGGGAGCAG GATAAGACTT GGTACTTACC AAAGCTCCCG 

167281 GGCCCACACA CTCACCTTGT AGCCCTGGCA TACGTCTTCA ACAAGAGCTG TGGTGTGCCC 

167341 TTTGTGCTGT GGTGCCCGCT CACAGCGCCA GCAGATGAGC TGCCCCTCGT CTTCGCAGAA 

16 74 01 CAGGTGGAAC TGCTCTCCGT GTTCCTCACA TGACATTTCT TGATCCGTCT CTTTGAGGGC 

167461 TTCAATGAGG CTTCCCAGCT GCTTGTTGGG TCGGAGGCTA TCCATATGAA ATGGAGCCCG 

167521 ACACTGGGGA CAGCAGAATG TCTCCTGCCT CAGTTGCTTT TGGCTTGGGT TTTTAAAGAA 

167581 GTCTGTTATA CACAAGTGGC AGTAGCTGTG TCCACAGTTG ATGCTTACTG GGTTCGTCAT 

167641 CAGGCTCAGG CAGATGGAGC AGGTGGCTTC CTCCATCATC TTCTTGGTGC TGGTGGTTGA 

167701 GGCCATAGCT TTTATTGAAA AGCTCCAATA TTGGCTCTAG AGATGGAGAT GAAGCAGCCA 

167761 GAATTTTCCA CCGTGATGAA AATACACCTC ACCTGCACCT CTATGTGATG AGCTGGCTGC 

167821 AACTGACTTC CATAGGTCTT GAAGGTTTTC CTTCCAACCC CTATTATCTC ATTTTGTATT 

167881 GAAGAAAAGA GGACCTAAAA GG AAGAAGTT GAGGCTGAGG TTGTTTGGGC CACGTTTGAG 

167941 AACTGCAACC CAAGTGCAGA GTTTCAAGTT GCCCTCATTA GCAAGCAGTT ACAAGTGGTT 

168001 GTTTAGAGGA AAAAAAGCAG TTTTAAAGCA GTTTTAAAGT TGTTTGCCAA GAATTTACAT 

168061 TAAAATAGCA TAAGCTTTTG ACTGGCTATA CATTGTTCTT TGTATTACAA ATCTCGGGAA 

168121 TATGTAGGTA ATAGATGAGG CAGCCAGTCA GGAACAAAAT GCTTTTAAAC ATGGGGTCTT 

168181 AACTGAAGAC CTATACTCCT GCCTCACTTG TCCTGATAAA TTTTGCATAC CTCACATAGC 

168241 TCAGACTGCT CTAAATTATT TCATTATTTT TCTTTTCTCA GTCTTCTAAC TTTTTTTTTT 

168301 TTTTTTAATG AGACGGAGTC TCACTCTGTC ACCCAGGCTG GAGTGCAGTG ACGCTATCTC 

168361 GGCTCACTGC ACCTCCGCCT CCCGGGTTCA AGCGATTCTC CTGCCTCAGC CTCCCGAGTA 

168421 GTAGCTGGGT CTACAGGTGT GCACCACTAC GCCCAGCTAA TTTTTGTATT TTTAGTAGAG 
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1684 81 ATGGGGTTTC ACCATGTTGG TTGGCTCGAT CTCTTGACCT TGTGATCCAC CCGCCTCAGC 

168541 CTCCCAAAGT GCCAGGATTA CAGGCATGAG CCACCGTGCC CAGCCTCTTT TTCTTTTCTT 

168601 ATAAGACAAG TTCTCGCTCT CTTGCCCAGG CTGTAGTGGA GGGCAGTGGC ATGACCACAG 

168661 CTCACTGCAG CCTCGACCTC CTGGGTTTAA GCAATCCTCC TGCCTCACCC TGGCAGAGTG 

168721 GCTGGGACTA CAGGTATGTG CCACCATGTC CAGCTAAAGT CTTCTCTCCA GAAAGAAGAA 

168781 ATGCATTGGA ATTTAGAGGA TACACAAACA TCTAGCTGTA TAGCTAATAC AGTAGCCACT 

168841 ATCATGAGTA GGAATTTAAA TTTAACTTAA TAAAAATTAA AATGAAAAAA TTCAGTTTTT 

168901 CTGTTCCAGT TGCCACATTT TGATTGCTTA ATAGTTGCAT GTGACTAGTG GCTACATAAC 

168961 AGCCTCAATA TACAACATTC TGTTATCACA GAAAGTTACC TTGGACCAAG TGCTGGGAGA 

169021 AGCAATGCAG GCTTCCTCAC AAAAGCTGTA AAAGAGAGAA CTCAGGGAGT GTGAAACTCT 

169081 TTCCTATTCT AGTTAACTTC AAGAATAATT GTTACCAGGC CAGCACGGTG GCTCACGCCT 

169141 GTAATCCTAG CACTTTGGGA AGCCGAGGCG GGCAGATCAC CTGAGGTCAG GAGTTTGAGA 

169201 CCAGCCTGAC CAACATGGCA AAACCTCATC TCTACTAAAA ATACAAAAAG TTAGCTAGAT 

169261 GTGGTGGTGC ACACCTGTAA TCCCAGCTGC TCAGGAGGCT GAGGAAGGAG AATGACTTGA 

16 9321 GCTCCGGAGG GGGAGGTTGC AGTGAGCCCA GATTACACCA CTGCACTCCA GCCTGGGTGA 

169381 AAGAGCGAGA ATCTGTCTTA AAAAAAAAAA AAAGAATAAT TGGTACCAGA ATTACTCTTT 

169441 GTAATTAGTA GTAACACTTA TGCAATTGGG TGATCTGTGA CAGATTCCAT TGAAGGAGTA 

169501 TGGGGAGCTT CACCCCAATA TATGACTCCC TGGTATAATG AGTATTTTGA ATTAAAGGCC 

16 9561 CTTAGAGATC AGCAGATGCT GGAAGAGACT TTTCCCCTAT CTACATAAAG ACCAGTCACA 

169621 CTAGACAAGA AGAACAATTG TTTTTCCTTC CAACCCCTAT TATCTCATTT TGTACTGAAG 

169681 AAAAGAGGAC TAAGAATGTA ACCAGACCTA ATCAGACACT TTCACAAAAT AATGTCTGTC 

16 9741 TCTCAGGCTC ATTCATTTTC CAAAGAGAAC CATTTACAAG TTAAACTCTG TTCCTCCATT 

1698 01 CATTCATCCT CCCAAATATT CATTTATTCT CCCTAGTAAT CATTTACTGC CCCTCAAAGA 

169861 ATTACCTATA TTCTCCTGAT ATCACCCTTC CCCTCTGAAA TAAATATGTA TACATGTATA 

169921 AACGTTATAC ATACATATTT ATACAGTATA CATACATATT TATACATACA TACATATGCA 

169981 TACATATTTA TATTTATGTA TTTATACATA AGTATTTATA AATAAGGCTA TATAAGTATC 

170041 TACCCCCATT GGCAGAGGGG GTAATCACTC TGTGATTCTA GCCCATGTAC TTGTTAATAA 

170101 ATTTGTATGC CTTTTCTCCA ATTAGCCTGC CTTTTGTGAG TCGATTTTTC AGTGAACTTC 

170161 AGAAGGCAAA GGGGAAGTGT TCCCTTGGCT CCTACACCAT CATGACAATA AAATTTGACT 

170221 CCACCTCGAC CCCCCCCATC CCCCACAAAG AACAACAACC AACACTGGTT AATAAGGTCG 

170281 GTTGTTTTTT GTTTGTGTTT TTGTTGTTGT TGTTTTTGCT TTCAGGAGCA GAGGTATAAT 

170341 AGGCAAAAGA AAGAGAAAGG AGAATAGTGA ATACCTCTTC TGCAGAGAGG GGTGCCTAAG 

170401 TGGGACTTCC CTGGCTAATA ACGTCTTGCT AGAGACCCAA CCAGGAGGAT AATGGAAGCA 

170461 ATCAAGGCAA CCAGAACAAC CAGAAGAACC GGTTTATCCT TTTTGTGCCC TCTCCCTAAA 

170521 CTGAGGGAAT AAGAATTGGA AAGAAGGCTG CAGAGCAGAG GGTTTGCTCC TGAGGAGCAG 

170581 TTATTTCTAT GGGATCAGAG CTCCTGCAGA ACTGGGGAGT TTACTTTTAC TATCTCTTCT 

170641 CCAGGACAGG ACCTATCTCA AGAGACATGT TCAGAGTGAT TGCAACATAA AGAGTTTGCA 

1707 01 GACCCAAGGA GGTAGGGAAG GCAGAAAGAA GATGGGGGAG GCCAGGGATA GGCAACAGAG 

170761 GAGTGACCAG GAGCGAAAAA GCCTGCCTCT TCTGAGAACC TAGCTGGGCT CTCCCTGTAC 

170821 CCCCGATCCC TCCCCCCCGC CCGCCCCCAC ACCCCTACTC CTGGGAGCTC CTCTAGGACA 

170881 GGGGCAGAGT CAGGAGGAAG TTTGAAGAGT GCCTAGAATA AAAAACAGTA ATTTAACTAC 

170941 AATTACCGGG TAGGCTGTTT TCCTCTCACA ATTTGATCAG TCTCTTGAAG CCACACAGAA 

171001 TTTCTTCTGA AGACGTGTAT TCCTTGGCAG GCTATTTCCT CCAGTGATAC ACCAGGCCCC 

171061 TCTCTGCTGG GGTCACTGCT CTTCTGGGGA GATGGGGCTC CCCTCCTTCC AAGGCTCCAG 

171121 GGTTCCTGTC CTGGGCCCCA CTCATCTAAG TTCTGAATCT TCTGAGATTT GGTGTAAAGT 

171181 CTGGTGAAAG AAAGAGCAGG AAAGAGGTGA GAGCTGTAAA ACAAAGAAAG TCCTGACCAT 

171241 TTTCAGAGTT GGAGGGGCCC TGCTGTCACG AAATATATTC CCCACCCCAC TTGCCATCAG 

171301 TACACACTCA CATATCCACT GAGAAAACCT TAGCCTGGAC CTTTTCCGTA ACCTTCACTG 

171361 CTCAGACACT TACATATTCG CTGCTAGTCC CCTCTGTTGC TGCCACTTCC TGGGTCAGGA 

171421 AGTTAACTCA GACCGGATTA AACTGAGAAG TGAAACTACT GTGGGAGGCG GGGCTCATAA 

171481 GATTTAGGAG AAAACTAGTG ACGTTGTTCA TATCATTTGC ACTCCGCCTC TCCGGTAAAG 

171541 GAGGGGGAAA CGTAGGAAGA AAATATCCTT CTTTTACAGC AATAAAAAGA AGGAACCAAT 

171601 TAATAACCCT GTAAACTATC ATGTGACCCC AACACAGAGT ATCTAAAAAC AGGAAGCCTG 

171661 CAGAGGTTCA GTTCACAGAC TCTGATTTGA GATCTTTCTA CTTTTGCCAC CAACTCCCTT 
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171721 GGGAGTCCTT AAGCCTTCCT AGCTGATGTT ACTTCTTTTG CTATTTATGG GTTGCTTGTG 

171781 GTTCTATAAC TGCTCTGAAG GGTGTGGTGG AAAAAGGGGT GGTAACAGCA GTAGGACTCA 

171841 TTGGCATCAC AAAATTCATC TGAGTCAGCT TTCTATTCTT CTCTGTCCCG TTCTGTGTCT 

171901 TGTTTTTCTC CTTGCTGTCC TTCTGCAGGA CTCAGATCTT CTTCAATAGC GAGGGTCAGC 

171961 CAGGATAGAA AATGGGAGTC ACTAGTGGCC CAGCAGTGAG TGCCCCCAGC TTAGAGCTGT 

172021 GTGGGATCCC TGGGACCATC ACTCTGCTTT GTGCTTTGTG G AG AAAAGG C TGTGGGGTCC 

172081 AGGGTCAAGT CCTTAATGAC TTAGCTCCAG CTTCTCCACT TCAAAATGAA AGGAAAAGTA 

172141 CTATCACCAC CCGTTAGAAT TATTATTTCA TGGGGAAAAA AGATGGATTA CTATCTCACA 

172201 ATAAGAGCTT GTCACATTTA TAAGTCTCAG GTGTAAGAGG CATTTATGAT AACAACATAA 

172261 TAAATGCTGG CTTAAGTAGA TGCAGTGGTC CAAGGGAACC AGTAAGGGGA GCTCAGGACA 

172321 CAGGTGGGAG GAGAAATTAA ACTTGAATTC TGGGAGCCAC TGGCCTGTCT GGGCCCCTGG 

172381 CCTGCCTGCT GACCCTGATA GCCAATGGAA CATGGAGTTT GGCCCAGCTG CAATCCCTCT 

172441 GGTCCAACTA CTCAAAATAA AGGCAAGATT GGGAAACACG TTCCTTTCTT CCTATACCAA 

172501 GCAGAAGACT CTTCAGCACT GCACCCTCCT GGGTGCTCAC AGAGCCTTCT GTTGTTTTGC 

172561 CACCTACGAT TCATCATGCC CTGGCATGAT GGTTGCAGAC CCCATGCATA GCATGGGACA 

172621 TTCTACTCCT GAGGCAACCA GCACACAGAG AGAGGAGAAA GAATGAGCCC CTGAATCCTT 

172 681 GGTCCCACGA TGAGTCCTTG CAGATATCTA CAACTTTCAT TGTTGTGGAT GTGACTCTGT 

172741 ACCCAGGCAT GGCTCATTCC AGATCTGTCC TATTGTCAGA GGTGTTCAAA CCAGAATGAC 

1728 01 TCCATTTTGA ATGGGGGCTA GGTAAAATAA GGCTGAGACC TACTGGGCTG CATTCCCAGG 

172861 AAGTTAGGCA TTGTAAGTCA CAGGATGAAA TAGGCAGTTG GCACAAGACA CAGGTCATAA 

172 921 AGATCTTGCT GATAAAACAG GTTGCAGTAA AGAAGCTGAC CAAAACCCAC CAAAATCAAG 
172981 ATGGCAACAA GAGTGGCCTC TAGTCATTCT CATTGCTCAT TATACACGAA TTATAATGTG 
173041 TTAGCAAGTT AGAAGGCATT CCCACCAGCT CCATAGTGGT TTATAAATAC CATGGCGATG 
173101 TCAGGAAGCT ACCCTATATA GTCTAAAAAG GGGAGGAACG CTTGGTTCTG GGAATTGCCC 
173161 ACATCTTTCC CAGAAAACAT ATGAATAATC CACTCCTTGT TTAGTACATA ATCAAGAAAT 
173221 AACTGTAAGT ATCTGTATTA GTCCATTTTC ACACTGCTGA TCCAGACATA CCTGAGACTG 
173281 AGTAATTTAT ACCAGGAAAA AATGTTTCAT GCTCTTACAG TCCCACGTGT CTGGGGAGAC 
173341 CTCACAACCA CAGCAGAAGG CAAGGAGGAG CAAGTCAGGT CTTACATGGA TGGCAGCAGG 
173411 CAAAGAGCTT GTGCAGGGAA ATTCCTTTCT ATAAAACCAT CAGGTCTCAT GAAACTTATT 
173461 GACTATCATG AGAACAGCAG TATAAATTAC TCAGGGAAAG ACCTGCCCCC ATGATTCAAT 
173521 TACCTCCCAC CAGGTCCCTC CCACAATATG TGGGAATTTA AGATGAGAGT TAGGTGGGGA 
173581 CACAGCCAAA CCATATCAGT ATCCTTAGTC CAGAAGCTGA TGCTCTGCCT GTAGAGTAGC 
173641 CGTTCTTTTA TTCCTTTACT TTCTTGCTTT CACTTTACTG TGTAGACTTG CCCCAAATTC 
173701 TTTCTCACAC GAGATCTAAG AACCTTCTCT TAGGGTCTGG GTTGGGACCC CCTTTCTGGT 
173761 AACACTATCA AAGGATCAGG AAAAGGAAGC TAGTGAATGC TAAAAAGGAA ACAAACTACC 

173 821 ATTACCAATA ATAACAGCAA GACAAAAGCA AAACGGATTG TGACAGCTGT CCCATCTCAC 
173 881 ACCTGTTTCC CATTGCAGGA AGGAGGGGCT GGTTCATGCA CAGAGTGGCC AATATTAGAA 

173 941 GCAGAGATGG GGTGCAGATG AGACTTCAGG AATATGTTGA CAAAGGCAGG CCTAGGGAGA 
174001 AATCAACCTG AACTATCCCC AAGGAGGAAT GCATTATCTC TAATATGTAA AGTTAGGCTT 

174 061 GATCCTGTGA TTATGGGATA TAGGAGTCCA AAGACTCACA ATGGGAAGTA GGTCACTAGA 
174121 GTCTCCTTCA GAAGCTCTGT ACTGTGTGTT CCCACTGTGG GCAAGAGTCA GCACTCAGCT 
174181 ATTCCTAGAA TGCCTTTCCT CAACTCCTTC AGATTTTGCC TCTCAACTAA CCCTATCCTG 
174241 ACCACTTGTT AGCAAGTGTA CCCCTCTCTC CCTCCCAAAC ATTTTCAAAT CTATTTTGTT 
174301 CCCATGGCAC TTATCACTGA ATATTTTACT AATTTATTTT GTTTAGTGTT TGCTTCCCTC 
174361 ATGAGAATGC AAAGGGATGG ATTTTTTTCA ATATTGTTCA CTGATGAATC CCAGTAACTA 
174421 GAATATTTCT AAGCATAGTG ATGTGCATTA AATCAAAGAG TAACTTTCTG AATTGCACTA 
174481 AACACACATC ACAAGAGGTG TGTGCACATA TGTGCATGAT GCACGTAGTG TGGTGTGGGT 
174541 GTTGTGTGGG GTATGTGGTA CTGTGTGTGC TGTGTGTGGT ATGTGATACA TAGTTTGTGT 
174601 TAGTGTGATG CATGTGATGT GGTATGTGTG TGCGTGTCCA TACATATTAG GGGTGGCGGG 
174661 GATGTTAATA TGTCAAATGG TACTAGAAAG TATCAGAACT CATGGTGCTT ACTGGTTTCC 
174721 CAGAGAGCTG CTTCTCTCCC ACCTGTAGGA TATACTGATG GTTTGGACAG AGAAGAAATA 
174781 AAAAGAAGGC TGTGACCTAC TGGGCTGAGG AAATAAAAAC GAAAGTAAAA GAAGAGCTGG 
174841 GAAAAGAGAG TGGAGGGGCC AAGGGAAATT TCCCCTTTGG CTTCTGGGGA AACTTTGCTG 
174 901 AAAAATCAAC TCACAAATTT ATTAACATGT ACACAGGGAG AACCATAGAA TGATTATCCA 
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174961 CTTCCCAAGA GGGCTTAAAA GCTTATATAT TATCCTGGCA AAACAGATTA TGGGAGGGGA 

175021 AGAAGAGAAA CTCTGTTGAT GGGATTACTG TTGCGGATTT TTGCTCCTTC GCTCAGCTAG 

175081 GTCCGGGTTT TTGTCTCACA GCCAGGAAGA ATTAGGCATG CAGCCATCAA AGAATGAGTG 

175141 GAGTAGAATT TATTAAGTGA AAGGAAAGCT CTCAGCAAAG ACAAGGGTCC TGAAAGCAGA 

175201 TTTCTGGTTT GCTCTTCACA GTTGAATACT AGGGCTTAAG ACTCAAATTC CTGACAACTC 

175261 CACCCTGTCC TACCAGTGCA TGCAGGCCTT TAGACTGAGC TACTCCATAT TGATTAATTT 

175321 CCTGAACTGT GCATGTGTTA AGGAAAGGAA TCATCCACTG CAGGCATGTT TAGGCAAGCC 

1753 81 CCCTGTGCAA GTTCCCTTAT CTGCACAAAA CATCCGGTGT AAGCACTTGT GGGGCAGGTC 

175441 AGAGGTTCTC TGGGTACCAT TCCCTTACTG TCTGCCTAAA GCAAGCTGGC CAACTCCTTT 

175501 CATTACTAGG GAGAGTAAGT AGATCAGGGA ACAGAGATTA ACTTGAACAT TATCTTGTGA 

175561 AAGTCCGTTC GGGCATGGTT ACATTCTTGG TCTTACAGGA AGGGTAAATA AAAATAATTG 

175621 CTCTTTTTGG TGGGTCTGGA TCTTAGGTAG ATAAAGAAAC TTTAATTCCA CGATGTGTTT 

175681 TGGTAGGGAT AGTTGGTGGC AGGGATGTCA GAGAGACTTT GAGGCTTCTT CAGTTCAATA 

17 5741 TGACCAAGGG CCATATATTA GGGTATCAAT TTCTGAGCCC CAACAAGAGC TTAGGAGAGA 

175801 TGTGATAGCA TCACAGTGTG AAAGCAATTT TTTGTTTGTT TTTAGAGACA GGCTCTTGCA 

175861 CTGTCACCCT GGCTGAAGTA CAATGGTACG ATCACAGCTC ACTGTAATCT TGAACTGGGT 

175 921 TCAAATGATC CTCCCATCTA AGCATTTCAA AGTGTTGGGA TTACAGGCAT GAGCCACGGT 

175981 ACCCAGCCTG AAACTGCACC CACTTTCTGA TAAACTTTTC AAATGACTAA AGGGGAGAGA 

176041 GTAAGCACTA CTCAGAGGTA GGAAGAAAGG ACACAGGATT ATAGGATTAA AACAACAACC 

176101 ACCAAAAAAA ACCAGACCGG TGTGGTGGCT CACACCTGTA ATCACAGCAC TTGGGGAGGC 

176161 TGAGGTGGGG GGAGTCACTG GAGGCCAGGA GTTCGAGACG AGCCTGGCCA ACATAGCAAG 

176221 ATGCTGTCTC TATTAAAAAA AAAAAATACC TGCCTTGAGC TAATCAGAAT CATGGACCCT 

176281 GACAAAGGAT GTCCCAAAGT AAGTCTTAGC ATTTTTTTTT TTTTTTTGAG ACAGTCTCGC 

176341 TGTGTTGCCC AGGCTGAAGT TCAGTGGCGT GATCTCGGCT CACTGCAACA GCTGCCTCCC 

1764 01 AGGCTCAAGC AATTCTCCCT GCCTTCAGCC TCCCAAGTAG CTGGGATTAC AGATGCCCAC 

1764 61 CACCACGCCT GGCTAATTTT TGTTTTTTTT AATAGAGATG GGGTTTTGCC ATGTTAACCA 

176521 GGCAGGTCTT GAACTCCTGA CCTCAAGTGA TCTGCCCACC TTGGCCCCTC CATAGTGCTG 

176581 GGATTACAGG CGTGAGTCAC TGCACCCGGC AAAGTCTTAG CATTCTTTAC AAACAGTTTG 

176641 TACCCGTATC TCTAAAAGGG AGTAGTGAAT TTCACCCCAA AATGTGGCTT CCTGATATAA 

176701 TGAGTATTTT GAATGAAAAA CTCTTAGAGA TCAACAGACA CTAAAGAGAC TTTTCCCTAG 

176761 GTACATAAAA ATAGGATGGC CCCACCAGCG AGAACAATTG TTCTTTTCTC CCTCTCTGTT 

176821 ATCTCATTGT GCATTATAGG AAAGACCAAG AATGTAACCA CACCTGAACA GACCCTTTTA 

176881 TAAGATAATC AGTCTCTAAG CATCATTTAA ATTCCAAGGA GAACTATTTA CAAATTTATC 

176941 TGTTCTTTGA TCCAATTAGT CTCTCCTGGT AGTTACATAT TGCCCCTCAA CAGAATTCCT 

177001 CTTCTTCTGT TTCCCATAAC CTATTTTGCA AGGATCAAGC CCCTGTTATT TCTTCAACTT 

177061 CAAGGTGGCA TATAAGCTTC TAAATTCCAC TGGGATATTG GTACTATGTG CATGAGGAGA 

177121 ACCACAGAGT AATTAAATTG TAAAGCCTTT TATCTTATGA ATCTGCCTTT TTTTGTGTTC 

177181 ATTTTTCAGC AAAACTTCCA AGGGCAAAGG TATAAAACAA AAATAAAATT CTAAAGCCCC 

177241 CCAACCATCT GAATAGACTT TCTCTTCAGT CAGGCTTCTT AAAATGTAAC CTGAAAGACT 

177301 GGCTCAGGCC ATTAAGGGAA GTGGGGGTTG AACATGCCTC ATTATTCCTC TCTGGCATTA 

177361 ACATCAACAC AGCTTTTAAG TCTGATAAGA AACATTTTAC AACCTATTCT CTCTGAAGCC 

177421 TGCTAGCTAA AAACTT CATC CCATAGTACA ACTTTGGTCT TCACAACCTG TTATCACAAC 

177481 CTAGTGCTCC TTTCTATTAA TCCCAAATCT TTATACAAAC TCAACCAATT GTCATCACCT 

177541 CCACCCCACT CCTCCGCTGC TTCCAGTTGT CCCGCCTCTC TGGACCAAAC CAGTGTACAT 

177601 TTCTTAAACG TATTTGATTG ATGTCCCATG CCTCCCTAAA ATGTATAAAG CCAAGGTGCA 

177661 TCCCAACCAC CTTGAGCGCT TGTTCTCAGG ACCTCCTGAG GGCTGTGTCA TGGGCCATGG 

177721 TCACTCAAAT TTGGCTCAGA ATAAATCTCT TCAAATGTTT TACAGAGTTT GGCTCTTGTC 

177781 ATGACACAGA TGACTGCTTC ACTGAAGCCT GCTCTGGAAG TGAGTGGGGG TTTTGCAAGG 

177841 ATAATTTTCC CCGGATAGCC CCAGAAGCAG CTAGTAATAA TACACTTAAA GGTAGCTAAA 

177901 ATGCATTGAA CACTTGTTTT GTGCCAGACC TATGTCAACA TTTGCTTTGT GCCAGGCTTA 

177961 TGCCAGTACT CCTGATTTGT TAATACATTC TAAATAAAAA TTCTGGAGTT TCAAATATAA 

17 8021 TAACTGAAAA ACAGAAAATA AATAAAAATA TATAATAACT GAAATAAAAA TTTACTAAGG 

178081 CTGGGGATGG TGGCTCACTC ACACCTGTAA TCCTGTTACC GGAAAGGGGT CCGTCCAGAT 

178141 CCAGACCCCA AGAGAGGGTT CTTGGATCTC ACACAAGAAA GAATTCGGGC GAGTCTGTAA 
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CATACTAATA 
GAATGTTTGG 
CATACATATT 
TAGACACATC 
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184681 GACCAAAGTG GAATGAAGAT AGAAATCAAT AACTAGGCTG GGCGTGATGG CTCACGCCTG 

184741 TAATCCCAGC ACTTTGGGAG GCCAAGGCGG ACAGATCACG AGGTCAGGAG TTTGAGACCA 

184801 GCCTGACCAA CATGGTGAAA CCCTGTCTCT ACTAACAAAA TACAAAAATT AGCCAGGCCT 

184861 GGTGGCATCT GCCTGTAGTC CCAGCTACTC GGGACACTGA GGCAGGAGAA TCACTTGAAC 

184921 CCAGGAGGCA GAGATTGCAG TGAGCTGAGA TCGCGCCACT GCATTCCAGC CTGGGAGACA 

184 981 GAGCGAGACT CCGTCTCAAA ATTAAAAAAA AAAAAGAAAC TAGAAAAATA AGAACAAATC 

185041 AAACCCAAAG CAAGCAAGAG GAAAATGAAA AATTTCAAAG CAGCCAAGAA CAAAAGGCAC 

185101 ATTATGTACA GAAGAACAAG TGTATAGATC ACATATTTCT CATAGACACA ATATAAGCAA 

185161 AAAGACAGTG GAGCAAAATT TTTTAGATTA ATGAAAGACC TACAATTCTG TACCAAGCAA 

18 5221 AAAAACTCCC CCCAAATGAG GGTGAAATAA GACAATTTAA TACAGAGAAA AGAGGAAGGA 

18 5281 ATTTATCTAG TCATATGTGA GAGTTTTATG ATACATTTTG TACTGTATAT GTGGATGTTT 

185341 TCTATTTCAT TTAAAAAATC AACCGTGCAA TTAAATGGTA GATTGTCTTG CTTCTTTTTG 

185401 ATTGACACAG TCATTAACTA AAATATTGTA GTATTTTTTT ATCTCCCTGC CTAAAGGCAA 

185461 TAAACATCTA ATCAGCAGAC TAGAACAATA AAAAATATTT TTTAAAAGTC CTTTAGGCAG 

185521 AATGATAAAA GTCCCTTAGG CATATTGAAA TTCCTATTTA TACAAAGGAA TAAACAGTAC 

18 5581 TAGAAATTGT AACTATGTGA GTAAACAGAT AATATTTTTT CTCCATAAAA TGTGGTTGAC 

185641 TATTTTCACA AAAATAGTTA ACAATGTAAT GTGTGATTTA TAGCATTTAA AAGTAAAACA 

IB 5 701 GGCCGGGCAC AAAGGTTCGT GCCTGTAATC CCAGCACTTT TGGAGGCCGA GGCGTGCAGA 

185761 TCACTTGAGG ACAGGAGTTC AAGACCAGCC TGGCTAACAT GGCAAAACCC CATCTCTACT 

18 5821 AAAAATACAA AAATTAACCA GGCGTGGTGG TGCACGCCTG TAATCCCAGC TACTCTGGAG 

185881 GCTGAGGCAC AAGAATCACT TGAATCCAGG AGGTGGAAGT TGCAGTGAGG CAAAATTATA 

185941 CCACTGTGCT CCAGCCTAGG CAACAGAGCT AGACTCTGTC ACACACACAC ACACACACAA 

186001 AAGAAAAGTG TATGACAACA ACAGTGCAAA AGAAGTGGAA ATGAAAATAA TGTTATTTTA 

186061 TATAAGTGGT ATACTTTTAG ATGAACTACG ATAAATTAAT GATGTATACT ATAAACTCTA 

186121 AGGCAACCAC TGAAATAATG AAACGAAGAA TTATGGCTAA CAAGCCACAA AAAGAAATAA 

186181 AATAGAATGA GAAAAAATAT TTAAGTTGTT CAACAGATGG GAAAAAAAAG AGGAAAAAGA 

186241 GAACAAAGAA CAGATGGGAC AAATGGGAAA GTAATAGCAA GATGATAGAC TTAACTCTAC 

186301 CCATATAGAT TATCACACTT AAGGTAAATG ATCTAAATAC TCTAATACAA AAGCAGAGGT 

186361 TGTCAGATTG AATTAAAAAA ACAGACAACA ACAAAAAAAA GCAAAAAAAG AGC CACAACA 

186421 TGCTGCCTAC AAAAAATTCA CTTTAATATA AAGACACAAA TAGTCTAGAA CACCATCACT 

186481 TTTAACCTTA TTTACTCAAA CCTCCTGATC CCTATTTATT TATTTATTTA TTTATTTATT 

186541 TATTTATTTA TTTATTTATT TTTGAGACAG AGTCTGACTC TGTTGCCCAG ' GCTGGAGTGC 

186601 AGTGGCACCA TCTAGGCTCA CTGCAGCCTC TACCTCTCGG GTTCAAGCGA TTCTCCTGCC 

186661 TCAGGCCTCC CAAGTAGCTG GGACTATAGG CACATGCCAC CATGCCCAGC TAATTATTAT 

186721 ATTTTTAGTA GAGACGGGGT TTTGCCATGT TGGCCAGGTT GGTCTCAAAC GCCTGACCTC 

186781 AGCCTCCCAA AGTGCTGGGA TTACAGGCGT GAGCCACAGC ACCCAGCTCC TCTTCATTTA 

186841 TTCTTGCTAC GCTTCCTCCA ATCCATTTTG TGCATTTGAT GATTTTGCCA GTAACTTCTT 

186901 TATTTTTCTG GTAAAATTAC TTATGGGTCA CTGAGGACTG GGATGTTCTT TCTTCTAGAG 

186961 GGGGTTTGTG TCTGCTTTTG CCAGGAAGCT GGGGTACCAC CAGTCAAGTA TTACTTTAAA 

187021 CTCAATTCAT GAATTGAGAC TTTTTTTTTT TTTTTTTTTT TTACGCAGAG TCCTACTCTG 

187081 TCACCCAGGC TGGAGTGCAG CGGTGTGAAC ATGGCTCACT GCAGCCTCAA CCTACTGAGC 

187141 TCAAGCAATC CTTCTGCCTC ACCATTCTGT ATAGCTAGGA CTACAGGTGT GTGCCACCAT 

187201 GCCTGACTAA TTTTTTAAAT ATTTTTTTTA GAGATGGGGC TCACTTTGTT GCCCAGGCCA 

187261 GTCTCGAGCT CCTGGGCTCA AGTGATCCTC CCACCTTGGT CTCCCAAAGT GCTGGGGTTA 

187321 CAGGCATGAG CCTCTGTGGC TAGCCAAGAC TTTTTATTTT TTAGCCTAAA TGTGTATAAA 

187381 AGTTGGCTTG TGGTTACAAC TTATCAGGAT TGATGATCTC TCTCTCTCTC TCTCTCTCTC 

187441 TCTGTCTCTC CCCACCTCTC TCACATCCCT TGCTCTGCTG AGAAGCAGAG CAAACATTCT 

18 7501 AGCAGTTTCC AGAGAGTAGG ATGGGATTAC TTCTAGTTTA CTTTTATCAT CCTTTGGGAT 

187561 CGCAGTATTA CTGGGAGAAC ACAAGTATCT CTTATTAGAC ATACCACCTT TGTAGAATCT 

187621 GGACTTTCAT TTTAGACTTT ATTTGTTTTC TACTATAAGC AATTTAAGTT ACAGATCTCT 

187681 CTACACACTG TTTAAGTTGC ATCCCATGAA TTTTGATGTG CTTTATTGTC ATTATTATAT 

187741 AGTACAATGT ATTTTGTAAT TTTTTGTGAT TTGTTTGGAG AGATTGATTA ATTAGAATGA 

187801 TGTTTAATTT CCAAATATGT GTGTTTTTTT CTACATTTCT TATTTTTATT GATTTCAAAT 

187861 TTATTTCTAC TGTAGTCAGA TTTAATAATT CATTTATTTT TATTATTTTC ATTTTTTTAG 



Figure 8 (Page 58 of 73) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



74/162 



187921 

187981 

188041 

188101 

188161 

188221 

188281 

188341 

188401 

188461 

188521 

188581 

188641 

188701 

188761 

188821 

188881 

188941 

189001 

189061 

189121 

189181 

189241 

189301 

189361 

189421 

189481 

189541 

189601 

189661 

189721 

189781 

189841 

189901 

189961 

190021 

190081 

190141 

190201 

190261 

190321 

190381 

190441 

190501 

190561 

190621 

190681 

190741 

190801 

190861 

190921 

190981 

191041 

191101 



AGACAGGGCC 

CTGCCTCAGC 

ATTCATTTAA 

TGGCCAACAA 

TAATAAAGAC 

AGTTCCACAT 

ATCTTACATG 

AACCATCAGA 

CCATAATTCA 

CAATTCAAGA 

GGAAATGCAA 

CAAAAAAACA 

CCACTGGTGG 

GTTACATAGG 

AGAGAACAAA 

CAGTCAAGGA 

CACAAGCTGT 

GTAGAGAATT 

GCATGCACAT 

TCATGGATAA 

AGTAGCCGAC 

CTACTTTTAC 

CAGCCTACTG 

AAAATCTGCT 

CATCCCCATG 

GTTCATTAAC 

GCCATGAGAA 

TGTTAAGTAA 

AAAGTAGAGA 

TGGGAAGGGT 

GCTAGATTGT 

CTCATTTATT 

AATGATAAAT 

TGTACACATA 

CTAAAAATAA 

GCAAAATGTG 

AGATAGGATT 

TACCAAAAAA 

TCTTTTCCAT 

TAAAACTTCG 

GTTTTCTTGA 

TCTCACTCTG 

CCTCCTGGGT 

TGCCAGCATG 

AGGCTGGTCT 

TTATTTTTTT 

TGGCTCACTG 

TAGCTGGGAT 

TGGGGTTTTT 

TCTGTCTCTG 

TTTAGTCTTT 

GTTATCTGTG 

TTTTTAAGTC 

TAACATTAAC 



TTTCTGTGTT 

CACCCAAAGT 

AAAGTGGGCA 

ATATATGAAA 

TTAACCTGAG 

GGCTGGAGAG 

GATGGCAGCA 

TCTCGTGAGA 

GTCACCTCCC 

TGAGATTTGG 

ATCAAAACCA 

AAAAATAACA 

AAATGTCAAT 

GTGGTCACAG 

TCTCTTGACA 

GGTAGAAGAG 

GTTCTCAGGT 

AAAATGCTAA 

TCAAGAGACC 

TCACGTAGGA 

CCTGACTCTG 

TTTGGACTGG 

ACAACAGAGG 

ACTGGCTATT 

TTTATTGCGT 

AGACAAATGG 

GAATGCAATC 

GATAAGCTAG 

AAAATTTTTA 

AGCAAGGAGG 

AGAAATGAGT 

GTATATTTTC 

GTTTAAGGTG 

TAAAAATATC 

AAGAAAAAAA 

CATGCAGATA 

GTTCAGATCT 

AGGGTGTTAA 

TTTTACTTTA 

TTATGTATTT 

TGAAATGACA 

TTGCCCAGGC 

TCAAGCGAGT 

CCAAACTAAT 

CGAACCTCTG 

GAGACAGAGT 

CAACCTCCGC 

TACAGGCACA 

CTATGTTGGC 

GTAACACTCT 

TTATGCTTTC 

TTTTTATATT 

GATTCTAACA 

ATTTATTTTT 



GCCCAGGTTT 

GCTGGGATTA 

AGTGAACTGA 

GAATGCTCAA 

ACTGGGGAAT 

ATCTCACAAT 

GGCAAAGAGA 

CTCATTCACT 

ACTGGGTTCC 

GTAGGGACAC 

CAATAAGGTA 

AATGCTGGTG 

TAGCATAGCC 

CCTCCCTTGA 

TTACACAAAC 

CAGGAGGGAA 

TGACATATAC 

TAATACATGT 

ACCCAAAACA 

CTCCCATAAC 

CTATCAGCGT 

CTTTCAAATT 

TTTCTCAGAA 

TATCCAAAGG 

CACTCTTCAC 

ATAGAAAATG 

TTGTCATTTG 

GATTGGAAAG 

GCTCATGGAT 

GGAGGATAGG 

TCCGGTGTTC 

AAAAAGCTAG 

ATGGATATAC 

ACTCTTTATC 

GAATATGATC 

TTGTGTATAA 

TCTGTGTCTT 

ACTCTCCAAA 

TGTATTTTGA 

TGAAACTCTG 

CTTTTCTATT 

TGGAGTACAG 

CTCCTGACTC 

TTTGTATTTT 

ACCTCAGGTG 

CTCACTCTGT 

CTCCTGGGTT 

TACCACCATG 

CAGGCTGGCA 

CTGTCTTAAA 

TGTTTGCATA 

TAAGATGTTT 

ATCTTTGCCT 

CTTTCCACAG 



GTCCCAAACT 

TAGGCACGAG 

ACAGACATTT 

CATCACTGTA 

TTACAAGAGA 

CATGGTGGAA 

GAGCTTGTGC 

ATCATAAGAA 

TCCCAGGACA 

AGCCAAACCA 

TCATCTCACC 

AGGATGTACA 

ATTATG CAAA 

AAGGAAACAA 

TGCATCTGGG 

AATCCCTAAG 

TCATTTTAAT 

GATGTATGTA 

TATTTAACAA 

GGGAGTTTCT 

GTACTTTCAC 

CTTTTGTGCA 

ACCTAAAAAT 

GAAGGAAATC 

AAGAGCTGAT 

TGGCATATAT 

TGGCAACGTA 

ATAAATACTA 

TTAGAGAACA 

GAGAGGTTGG 

TGCACCATTG 

AAAAGAATTT 

TAATTACTCT 

CCGTATATAT 

TATCATGATG 

TGTTCTATAA 

TACTGATATT 

TGTGATTGTA 

AACTCTGTTA 

TTGTTAGAAT 

GTCATTGTTT 

TGGCACAATC 

AGCCTCCAAG 

TATTAGAGAC 

ATCCGCCCAC 

CACCCAGGGT 

CAAGCAATTC 

ACTGGCTAAT 

ACTGACTCCT 

CTCTATTTTA 

GTGTATATAT 

CTCTTCTAGC 

TTCAATTGAA 

TACACTGGCT 



CCTAGTCCCA 

CCACCCGTGC 

CTCAAAAGAA 

TTAGTCTGTT 

AAGAGGTTTA 

GGCAAGGAGG 

AGGGAAACTC 

CAGCATAGGA 

CATGGGAATT 

TATAAATAAC 

CCAGTTAGAA 

GAAGAGGGGA 

ATAGTATGGA 

GAAACTTGTC 

GCTAGTGGTT 

TTCGTGCAAG 

AGTAAGAAAC 

CTAGCGTGTA 

CAATGCCCAT 

TCAGTGTCAA 

CTTGCAATAA 

GGGAATTCAA 

AGATCTACCA 

AGTATACAAA 

ATATAGAGTC 

ACACAATGAA 

GATGAAACTG 

CATGTTATCA 

GAACTGTGGG 

TTAATGGTGA 

TAGGGTGCAT 

TGAATACTCA 

GATTTGATTA 

GTACAGTTAT 

TATATATCAT 

ATCAATTAGC 

TTGTCTAGTT 

GAATTGTCTA 

TGACATTTTG 

CATACATTTA 

TTGTTTTTTC 

TTGGTTCACT 

TAGCTGGGAT 

AGAGTTTCAC 

CTCGGCATTT 

AGAATGCGGT 

CCATGCCTCA 

TTTTGTATTT 

TTAACAATAC 

GCTGTTATTA 

TTTAATATGT 

CAACGTGTTT 

ATATTTACAC 

AGCATCTCCC 



AGCAGTTCTC 

ACAACCAACA 

GGCATACAAT 

TTCATGCTGC 

ATGGACTTAC 

AGCAAGTCAC 

CCGTTTTTAA 

AAGACCCGGC 

GTGGGAGTTA 

TAATCATCAG 

TGGCTATTGT 

CTCTTATGTC 

AGTGAGGTAG 

AAATTGATGG 

AGAATATCCT 

TGCAGAAACC 

ACACCCTTGG 

TGGCAATATT 

TCCCACCCCC 

TTGGTGCTGA 

ACTCCTTTGC 

GAATCTGAAC 

GATGAGGCTG 

GAGACACCTA 

AACCCTAAAT 

ATACTATTTG 

GAGAACATTA 

CTCATATGTG 

TACCGGAAGC 

CAAAATTACA 

ATGGTTAACT 

CAACAAAATA 

TTACACATTG 

TATATGTCAA 

GTGTACTTGA 

TCAAGATAAT 

ATTGCATCAT 

TTTTGTCTTT 

CTATGTATTT 

TGATTATTAT 

TGAAATGGAG 

GCAACCTCCA 

TACAGGCATG 

CACGTTGGCC 

TTATTTTATT 

GGTGTGATCT 

GCCTCCCGAG 

TTAGTAGAGA 

AAAATATCAC 

TTATAGCCAT 

TTATTCTCAA 

GGTTCTTGCA 

CATTAACATC 

ATATAATATT 



Figure 8 (Page 59 of 73) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCIYUS97/17658 



75/162 

191161 GAACATAAAG TGTGATAACT GACATCCTTA TTTCATTCCT ACTCTGAGTG GAAAGGGCAG 

191221 GGGTGGAGAA AGCATTCAAC AATTTGCCAT AATTATAATG CTTTTTGTTA CACTGTTTTC 

191281 TTCTGCATTA AAAAATATCA TTACATTTTG CATGAATTAT TAGGAGAAAA TATTTTCCAA 

191341 TTTTCCTGGA AAATGCCATA ACCACGTCTC TCAATTTTGT TTCCATCTTT CTTCCACATT 

191401 TTACATAACC TACATAAGAG ACACATTATC AAGTATATTT TACATGGCTT CTCAGTGTCT 

191461 TCTCTGTCTG CTAACAGGTT TACCAAGAGA TGGCACTCTT GTATTTCTGG TGGCTATGTC 

191521 CATATCGTTT TGCCTTTAAG ACAGCGTAAC TACTTCTTTC ACCAGTATTA AAGACATGTA 

191581 CATTTGATCT GGTTCTTGTG GATGATTTTA AATGACTCAA GCTAATAATC CTAATTTTAC 

191641 CTAAACACTC CATTATTTTA AAATGTATTC CTTTATGCCC ACAATAAACA TTTATTGACA 

191701 TTAGGCTGGA CATTAGGCTT CTCTATGGCA GACATTAGGC TGGACCCTAG CCATATATCT 

191761 ATTGAGGGAA AAAAAATTAT TTTCTATATA AGTTTCCAGA AAGCCAAGAT GTGTTTTAAA 

191821 AACAAAACAA AACATTACAT TCTAAATGCT GTAACAAGAT AAGAAAAAGT GTTGAGGCTG 

191881 AGAGAAGAAC AAAGCAGCAA GCAACTCCTG GAAGGACCAC TGCTGCAGAG GTAATAACTG 

191941 GTGAACCATG TTTTGGAGAA GGAAAAGGTC ACCAAGAGAA GGAGGGGGTC CAGGGTGTTC 

192001 AGAAAGATTG CATGCATAAA GATCAAGGGT AATAAAAAAA ATTCCGTATT ATGTAAATGT 

192061 GAAGTTCCAG GACCATGAGC TTGGAGAGCA TGAAGTACAG GAGGAGGGTT GGTTTCAAAT 

192121 AAATCTGGGA ATGAAACAGT GAAGCCTCTG GCAGAACTCA CATCTCTTTC CTCCCCTCTT 

192181 CCTTGCACAT TCCCTTTATG GAGTAATTGC AGGGATGGGA AAAGTTCAAA ACCACCACTG 

192241 AGCCTAGGAA GTGCTAGGGT AAAGTGGAGA ATGAACCTGC GTGATTTGCT CATCCTAAAC 

192301 TAGGTTCTTC TAGGAGAGCC CTTCCCCATA AAATCTGCCC TCCTCGAAGG GGCCCAGACA 

192361 GCCTAAGCTC ACCTCCCAAA GACCCCTTAC TTGCTGACTG AATCTGATTC CACCCAGACA 

192421 TGGCCTAAAA CCCTTCCATA ACTCTATAGC CAAATTCAAT TTTAGACAGG CCTCATACCA 

192481 ACCTTTCTTC CTCTAAGTCT GCCACCCTAG GCAATTCTCA ACATTCTCTA CACACTTTGG 

192541 GGCCATAGAC GTGCTACCAA GTCTCCAGAC CTAGACCTGA TGGAGCAGTG CTGTAATGAG 

192601 ACGACCACTG GCCTTTGAAC CAGACCCTTC TCTGTGGCTC CTATGCATCT CCAACCTGTT 

192661 TTGAGCACTG CTGCCAAGAC ATCTTTGGCA CTTTGTTGTG AAGTTTTAAA ACTGAACTAA 

192721 TCTACAAAAC ACCTAACCTT TAAAAATTCA TTGTCATTTC ATATCATGAA AGATAAAGAA 

192781 AGGCCAGGAA ACTGTTCCAG GTTAATAGAG ACTAAAGAGA TAGCAACCAA ATGCAATTTG 

192841 TGATCCTGGA TTGAGGGGAA AAAGTGTTGT CAGAGACATG ATTGGGACAG CTGGTAAAAT 

192901 TTGAATTTGA ATTTAAAGAT AAAGTATTGA GTAATATAGG AAGATGATTA TCTGCAACTT 

192961 TCAAATGTTT CAGTAAGTAT ATATATATAT AAAGAGATAT AAAGACATAT AAATAAATGG 

193021 ATAGGTAGAG AAAAAGCAAA TGTATAATAT TAACAATCTA GGTAAAAAGT ATATGAGTGT 

193081 TCTTTGTACT GTTTTTCTGA TTTTTCTATA TGTTTGAAAT CATTTTAAAA TAAGAAGGTT 

193141 TTTGGGTTTT TTTTGTTTGT TTTTTGTTTT TAGAGACAGC ATCTTATTCT GTCACCAGGC 

193201 TGTAGCTCAG TGGCCCAATC ATTGCTCACT GCAGCCTCAA CTTCCTGGGC TCCAGTAATT 

193261 CCCCCTACCT CAGGCTCATG AGTAGCTGGT ACTTCAGGTG TGCACCACTG CACTCAGCTA 

193321 ATTTTTATTT TTTAAATTTT TGTAGAGATG GCATGTTGCT ATGTCACCCA GGCTAGTCTC 

193381 AAACTCCTGC CCCCAAGTGA TCCTCCCACT TTGGCCTCCC AAAGTGCTAG AATTATAGGC 

193441 ATGAGCCACT GCACCCAGCC CCAAATAAAA AAGTATTTTA TTTTAATTAA CTAATTAACT 

193 501 TTGAGTCAGA GTTTCACCCT TGTCACCCAG GCTGGAGTGC AATGGCATGA TGTTGGCTCA 

193 561 CTGCAAACTC TGCCTCCTGT GTTTAAGCGA TTCTCTTGCC TCAGACTCCT GAGTAGCTGA 

193621 GATTACAGGT GCCTGCCACC ATGCCCAGCT AATTTTTATA TTTTTAGTAG AGACGGGGTT 

193681 TCAGCATGTT GGTCAAGCTT GTCTCAAACT CCTGACCTCA GGTGATCCAC CCACCTCCGC 

193741 CTCCGAAAGT GTTGATGAGC CACCACACCC GGTCTAAAAA GTATTTTAAA ACCACAGTCC 

193801 CACTCTACCT TGTCCTACAC TACCAGGGGC TAGGATCACC CCATGTCTTC TAGGCTATGA 

193861 GATAGAGGAA TCCAAGGAAG AAGATAAGCT ACTTGGTTCC TCTATAGGGT CTTGTGTGTG 

193921 CTCTCATGTG CTCTCTCTCT CTCTCTCTCT CTCACACACA CACACACACA CACACACACA 

193981 CACATGAATA CCAGAGCTAT CACTTTCCCA GTCTAGTACT CATCTCATCC CAAGGGTTTT 

194041 GTGTTGTAGT GGTTTGCTCA TTTCTTTGTT TTGTTTGTTT GCTTGGATTA TTCTTTTTCT 

194101 CTTTTTGCAG CTGAAGGGAG AATTTCCAGG CCAGCCCTTT GGCCATTAGA GTTACAGTGC 

194161 CTCTATTCAG GCTTCATAGA GAGACCTGGG ATTCAGTAGT GGGGGGCTTT TATCCAGTTC 

194221 AAAATAATGC ATTCTCACCA AGATGTACTT TGAAATAAAA CAATACTAAA ACACAAAATT 

194281 TTATTTATGC TGAACATTGA ATCACTTTTT TCTGTATTTT GTGTAGAAAG TTATACACAC 

194341 ACAAACACAT TTGCTCCTGC TTTGTTTATT GGCCCAGGGG TATGTTTGGT AATACTTCAT 
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194401 CAGGCATGAG TAGTACGTCT TGGAAGGTGT GGTCTAAAGC CTAGACTCCT ATCTGCTTCC 

194461 TTCAGCATTC TCCAGTGTAT CTGTCATCTG TCTACCTTAG GATAGGGGTC TCCAGAACTT 

194521 CCATTCACAT TTAGAAGAGG GCAGCGGCTT TCTATGGAAA ATATGAACTC TCATTCATCT 

194581 CTATTCCTTC TTCTAGCTAT GGTCCAGCTC AGCTGTTTGG AATAAAGTAT CTATATGAAG 

194641 TCTGCGAATG GTTCTCAGAC TGGTTGAACA TTAGAATCAC CTGAGTACCT TCTAAAATTC 

1947 01 TTATTACCCA GGGCATATCT CAGAATGAGT ACCGCAGGGT AGGGATAGGA TTAGGGATCA 

194761 TGATCTCTGG AGTCTGGTTT AGGCACTAGT GCTGTTTAAA ACTACGTTCA TGAGGTGGAG 

194 821 GTTGCAGTGA GCCGAGATGG CGCCACTGCA CTCCAACCTG GGCGACAGAG TGAGAGTCTG 

194881 TCTCAACAAA ACAAAACAAA AAAAACCAAC TACCCTTGTG ATTTGAATGT CCATCCAAAA 

194941 TTGAGAACCA TTAGGTAAGG CCAAGCTGTA TAATTAAAGA GCAGTTTTCA TTTGTCTGGT 

195001 GTGGTGGCAG CTTTTTGATA AGGGAAGTAT TGTTGCCATC CACATACCTG AGCCTCACTC 

195061 CTGAGAACAC TGGTGTGTAT GTTGCTAAAA TTCCCCAGGT GATTCTGAGG TTCCTTCCTG 

195121 GATAAAAACC ACTGACCCTG GGAATGTACC CACTGCCAAT CTCCTGCGTA AACCTTGGAT 

195181 ACTGGGAAGC CTACAGTTGA AAATATTGGG CTTGAGATCC TGAAACAAAT CTTGTATTTC 

195241 ATTAAGACTA ATATTTGGTA CAGTGCAGCA AATCAAGGGA ATTTTGGTGG CTGAGTTCTT 

195301 TTAGAACTTT TGCATTGAAA TAGGTTCAAG CAGCAATAAG TTAAAACTAC AACCTCAGCT 

195361 AAAGGATTAA AAGACACGTG AGCTGGGTAG GATGAGGTCT AAGGTTGGGT GTGGCGGCTC 

195421 ATACCTGTAA TCCCAGCACT TTGGGAGACT GAGGTGGGTG GATCACTTGA GGTCAGGAGT 

195481 TCAAAACCAG CCTGGCCAAC ATGGTGAAAA CCCATCTCTA CTAAGAATAC AAAAAAATTA 

195541 GCTGGGCGAG GTGCCAGGCA CCTGTAATCC CAGCTACTGG GGAGGCTGAG GGAGGACAAT 

19 5601 CACTTGAACT CAGGAGGCAG AGGTTGTAGT GAGCTGAGAT CGCACCACTG CACTCCAGCC 

195661 TGGGTGACAG AGCAAGACTC CATTTAAAAA AAAAATAATA ATAATAACAA TAATAATAAT 

195721 TCAGACATAT CCAGGCATCA AACAGATACC TGGGGCAGAT GAATAGTCTT GAGATTCAAG 

195781 TCACACATGA AATTTAGGTG GAAAATGACA TTGGAGAAAT TTGAGATTAT GATGAATGGA 

195841 AATTTTTCAA AGAGGAATTT CAGGCTCTGT TCTTGAGGGG ATAGATGGAC TTCCAACAGC 

195901 AATAACACAG GATTAATGAG GACTTGGGAT GTTACATAAA TTAGAGATGT TAGATGGATA 

195961 AAGAGATAAA AGTACTCTCT CTAAGAACAT GGGACCAGAG ATAGGCTCAC TTCTAACCAT 

196021 CAGATATAAC TAGCAGACTA AACGGTCTAA AAATAAAAAT CATGCCCCAC TCCTGCTTAA 

196081 GACATTTTAA TTACTCTCAG TAACTCTTCA GTTTTTCTAC TGTGTTATCT TTAACTACAG 

196141 GGTTGGTCTG GGTGTGCAAC ACAAGAAAGC CTGGCATATA CATGGATTCA AGTGTATGCC 

196201 ATGTGCAGGT ATTCTTTCAT GTACTATTTC ATGTATTCTT TTTCACATCT GTTTTTTCCT 

196261 TCATTGAAGT CAATGGCTGA TATTAGA7TC TACTATTCAT GTGTACTAGT TATATATAAT 

196321 TGTTACAAAA CAAATTAGCA AAAACTTAGT GGCTTAAAGC AACACACATT T ATT ATT AC C 

196 381 TAAGGTCTGT GGATAGAAGT TCTGACATGG CTTAACTGGG TTCCCTGCTT CAAGCCTCAT 

196441 GTGGCTGCAA TCCAGGTGTT GGCTGAGTCT GAATTCTCAT CAGAGGCTTG ATTGTGGAAA 

196501 TTTCCACTTC CAAGCTCCCT CAGGTTTGTT GAAAAATTCA GTTCTTTGCA CCGGTAGAAG 

196561 CTTCTTGGTA GAGGCTGATT CAACTTCTAG AGGCTGTCTG CAGTTCCTGT CACCCAGGGT 

196621 GGAGTGCAGT GGAGCAATCA TAGCTCACTG CAGCCTTGAC CTCCCAGAAT CAATCTGTTC 

196681 TCCCACCTCA GCATCCTGAG TAGCTGGGAC CACAAGTGTG TGCCATCACA CCTGCCTAAA 

196741 AAACAAACAA ACGAAAAAAA ACCCCCAGAG AACTTTGTAG AGACAAGCTG GTCTGGAACT 

196 801 CCTGCGCTCA AGCAATTCTC CTGCCTTAGC CTAAAAGTTC TGGGATTATA GGTATAAGCC 

196861 ACCATACCTG GCATATGGCA AGTCTTGAGC AGGACAAATA CAGATGATTT ATGTCTGTCT 

196921 TCCATGGTAT TCTAGGTTAT TGTTGAGATG GTCCTCTATT GTCTTGTTCC ATCTATTGAT 

196981 TAGATAAAAC GTTGTTCCTT CTGTTATTTT TCAACAGTAG CTTTTATGTG TCTCTCTTTA 

197041 TCTTAAAATT CTAACCAAAG AGCTGCTCTT TTCTTGGTGT ACTTTACCTT TGGTTGATCC 

197101 TTCTTAACCT CTTCTTGCCC TCTGGGGCCT AAGATGAGGG CTGTTATCAG ATGTGAGTCT 

197161 ATGGGAAAGC AAGCAAGAGG TTCTTCAGCC TCCGTTCAGC CTTAAATGTC TAGGTAGAAA 

19 7221 TCAGTCATGG CCCTTCCAAT GTGGTACAGA CCAGATCACA GAGACAGGGG TCTCAGCCAA 

197281 GGTCTTGTGG CCTAAGCCTT ATAGAAATAA TGAGTGTTTA CTTACTTGGA GAACTCCCTT 

197341 GGAATATCTT TTTTTGTGAA CCTGAGGCAA CTTTTGGTGA TTTCTTGATG TCTTGGGAAT 

197401 CTTGGTCTAG AGCCATTTCA ACCCGATTTC TTTTCATGTC AGTGGCATTT TGTGACCAGA 

197461 TAGTAAATAA GTTCTATGAT GTTCACTCAG AGAAATACAA TGACTTATGA TGCGAAGCTT 

197521 CTGTGGTTCA GCCCTTACTT CATCTTCATT CCCTCTTATC TGCATCTGTC TCCTGCTTGG 

197581 GAACAAAAGT CTGGCTTCAT TCTATGACCC CCACGTTGAG TTTCTTAGTA GCACTTACTT 
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197641 TTCAATTAGG AGTGTCCTCA CTTCTATCCG TCAGACATAA CTAGCCGACT AAACAGTCTA 

197701 AATATAAAAA TCATGTCCTA CTCCTGCTGA AAACATTTTA ATTACTCCCC ATCATTTAAT 

197761 TTTTTCTACT GGGTTATCTT TAACTTCAGA GTTGGTCTTG TGTGCAACAC AAGAAAACCT 

197821 GGCATATACA TGGATTCAAG TGTATGCCAC GTGCATGTAT TCCTTCATGT ACTATTTCAT 

197881 GTATTCTTTT TCACATCTGT TTTTTCCTCT AAAATTTATT TCCTTTTAAA AATGAAAATT 

197941 TTGCATTTGA CTAAATTTGT CAAATTTAGT CAAATTTGTT TAAAACCATT TTTAAAATGT 

198001 TTCCCGAAGT TTTGAGTGAA GTTAGTACTT CAGAAAAACT GTTTTGTATT TTTCCTGTGA 

198061 CCTCAGTGCA CTGCTGTGCA TTTCCATTTC TGCGTCCACA CACATTTGTT TTGAGGAAAT 

198121 ATAGGAACGA CAAGATAAAG TTCAAGCTCC TGGACATTGC ATAAAAGACC GTCATGACCT 

198181 GGTCCTGTTG ACTTCCCTAG ATTTCCCGCT ATTTCCTAAG TTGAGATTTT TGGTTTGGAT 

198241 GCTTTGTGTT TTCCTAAAAT CAAAATAGGT TTTTGCCTTT TATGATTATA CAGTAAATAA 

198301 ATGCTATTTG TGTGAAACTT TAAACAATAC AAAAAAAACC TAAGGAAGAA AGTCAGATTC 

198361 ATCTAAAAAT CCTTGTGGCC AGAATTAACT ACCTTAGTTA CTATTTTCTC TATCTCTCTC 

198421 TCTCAATGTA TATTTGGTGT AGGTATAGGG GTGTGTGTAG TGTGTGTGTA TGTATATATC 

1984 81 TGTTTCTATT CCTGTATGTG GATGTGCACA ACGCATCCTG CTTTGTACAC TACAGTACTA 

198541 GCATTTTTCT AATGTAATTC AATATTGTTG AAAACATTTT AAAAAAGCTT GTATATATAC 

198601 ACACACATAC ACATACATGC ATGTATGTAC ATATACACAT ACAGACAAAA ATGTATCCTA 

198661 TGTATATTCA CACATGTATA CACACTCACA CATACATAGA GTTTTACATC CATAGTTTAT 

198721 AAATGTTGCT TTTTTTTGGT CACCTTTTTG CTAAGTCTTA CACTTTTTTT TTTTTTTTTT 

198781 GAGACGGAGT TTTGTTGTCA TTGCCCAGGC TTAGTGCAGT AGCGCGATCT CACCTCACTG 

198841 CAACCTCGAC CTCCCGGGTT CAAGCGGTTC TCCTGCCTTA GCCTCCTGAG TAGCTGGTAC 

198901 TACAGGTGTG CGCCACCATG CCTGGCTAAT TTTTGTAGTT TTTTTATAGA GACGAGGTTT 

198961 CACCATGTTG GCCAAGCTGG TCTGGAACTC CTGACCTCAA GTGATCTGCC TGCCTCAGAT 

199021 TCCCAAAGTT CTGGGATTAC AGATGTGAGC CACTGCACCC GGCCAAGTCT TACACATCTT 

199081 TTTTTTACCA CTAAACTGTT TACCCAAACC TGATAACCCA AGTCAACAGC TATTATGGCT 

199141 CACACAATCT TATGTAAACA AAGATACAGA TATATAGAAT TTTCTTGATT AATATTCAGA 

199201 AAAAAATGGA GTCCCTTTAT ACGTCCTTAG TATCTGCTTT ACTCATTTAA AAATGTATTA 

199261 CATTATATGA AAGTATTCAG GTCAAATGTT ATAGATGTGA TTCATTCTTT TTAACTGTGT 

199321 TATTTTTCTG CAATGACTAT GTATCACAAA GTACTCAGTC TTCCACTGAT GAAAATTTGG 

199381 GCTATTTCCA GTTTGTCTTC CATTTTTCTT TCTTCCTCTT GGATTTTCAC TCAATGTGTT 

199441 TACTAATTTA GGAAGAATCA ATAGTTTTTA TGGTATTACT TCTCCCATTC AAGAATATAG 

199501 CATATGGTAT AGTATAGTAG AGTACTTAGT TTAATTTAGC CAGATCCTGT TTTCTGCCCT 

199561 TTAATAAAAT TCTATCATTT TCTGCCTTTG AGTCACATTT TCCTTGTTCA TATAATTCTT 

199621 AAAAAATGTA TAGTTTTCAT TCTAAGGGAA CATAAAAACT TCTTTCCATT TCTATTCCTG 

199681 TCTAGTTAAT TCTACTATTG GGAAAAGTAA CTGTTAAAAA AAATTCTTAT CTTTCCAGTC 

199741 AGTTCACCAC ATTTCCTTTA TACCTTTGTA CTTTAATCCC CAGTCATGTT GAACACTTCT 

199801 TATTCCTCAC ACCAAGCCTC AACGGGTTTG CTCTTTCTGG AAGGTGCTTC CCCTGTATTA 

199861 CTGACTTATT CATACCACAC ATGGAGACTG GCGCAGCCCT GTTCTGCCTG GGAAGCCTTC 

199921 CCCTGATACC CCCAGTTGGC AGGAGTCTTC ATTTGTTCTT TTCTAGTCAC CTGTGCAAGT 

199981 TTGTATTGTT CATGTTTATC ATCCTTCATT CTAGTTGTCT GTCTCTGTGT GTGGTCTCAT 

200041 TCAGTGGACT CTGAACTCTT ATGAAGTCAT GTCATGGGTC AGATCTTAAT AAATTAATAT 

200101 TGTCGGAAGC TAATGTCATG TCTAGAATAC AGAAAATTTA TCAAAAAAAA ATATAGTATG 

200161 TTGGCTGGGC GCAGTGGATC AAGCCCGTAA TCCCAGCACT TTGGGAGGCC GAGGCAGGAG 

200221 GATCACATGA GGTCAGAAAT TCAAGACCAG CCTGGCCAAA ATGGTGAAAC CTCATCTCTA 

200281 CTAAAAATAC AAAAAGTAGC CAGGCGTGGT GGTGCCCACC TGTAATCCCA GCTACTCAGG 

200341 AGGCTGAAGC GGGAGGATCA CTTGAACCTG GGAGGCAGAG ATTGCAATGA GCTGAGATCA 

200401 TGCCACTGCA CTCCAGCCTG GGCGACAGTG AGACTCCATC TCAAAATAAT AATAATAATA 

200461 ATAATAATAA TAATAATAAT AATTGTATGG AATTGAACTG CTCTGATTGG AAATAGCTGT 

200521 TTTTTAAAAA ATTATTATTT TTTAAGTTCC TGGGTACAAG TACAGGATGT GCAGGTTTGT 

200581 TACATAGGTA AACGTGTGCC ATGGTGATTT GCTGCACCTA TCAACCCATC ACCTAGGTAT 

200641 TAAGTACAGC ATGCATTAGC TCTTTTACCT AATGTTCTCC CACACCCCCA CCCCATCCTC 

200701 CCCCAACAGG CCCCAGTGAG TGTTGTTCCC CTCCCTGTGT CCACATGTTC TCATTGTTCA 

200761 GCTCCCACTC ATAAGTGAGA ACATGAGGTG TTTGGTTTTC TGTTCCTGCC TTAGCTGTTA 

200821 ATGTCAGGCC AGAGAGGCTT AAATTTTTAA GGATCTCTGG AC TTTT CTTC TACATTACTC 
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204121 TGGTGGCTCA CGCCTGTAAT CCCAGCACTT TGGCAGGCTG AGGCGGGCAG ATCACTTGAG 

204181 GTCAGGAGTT TGAGACCAGC CTGGCCAACA TGGTGAAACC CTGTCTCCAC TAAAAATACA 

204241 AAAATT AG C A GGGCGTAGCG GCGCGTGCAC CTATGCGCAT GCATAGTGCG CGTGCCAGCT 

204301 ATTCAGAAGG CTGAGGCAGG AGAATTGCTT GAACCCAGGA CGTAGAGGTT GCAGTAGTTG 

204361 AGATCATACC ACTGCACTCC AGCCTAGGTG ACAGAGTAAG ACTCTGTCTC AAAAAAATAA 

204421 TAATAATAAA AGAAAAGGAG AACATGACCA AAGTTATGAA TAAGACTGAA GGCAAGAAAA 

204481 TTGTACGCTT GTAGAGATCA CCTAGCTTGT TGCCCTCATT GTACAGCTAA GAAAAGGCAC 

204 541 CCAGGGACAT TGTGGTCAGC ACCAATTTCT CAGAAAGATA GGCAGATGAT GAGAGGGCCC 

204601 TCAGTTTTTC TAACACTGAA GGAATTGCTT CTATGTTTTC TGGTGAACTC CTCCCCACTC 

204661 ATCTTGAGGA TTCCAGGCCA GAAGAATCCA CTTTAAAAAA GAAACATTTA AAACCAATTT 

204721 AACAACCAAT CAAAGGCACT TTTATAGAAA TACATTTCAT TTGCTGTAGG CCTGTATTTA 

204781 TGGATCTGAG AGGGCTAGAC TGCCAATATT GTGACTGTTT ATTATTATTG CTGTTGCTAG 

2 04 841 TATCTAGAAT ATTATACAAC ATATAACACT TTGCAATTTA CGAGGCATGT CTCATACTTT 

204901 TGTTTTCACT CCAAACTGCC CAGTGAAGTA ACATTATCCC AATTCTTCCT ATGAAACAGT 

204961 GAAAGCCCTA AGAGTTTTTG AAACTTTACC TGGTTTACTC AATTTGGGAA TGGCAGAGCA 

2 05021 GAATTCAGTC CTTGAATATC CTCCCACTGC AGGTTCATGC TCTTTGATCT AGGTGTAACA 

2 05081 TTTACTCTGA GTAAACTAGG ACTCTGGGCT AACAGAGATG AAGCAAGACA GGCTGGATAT 

205141 TAGGAGAATC TAAGAGCAAT CTAACGACCA TTATAATAAA ATCATGAGTT CTAGACTTAA 

205201 AAAAAGGGAA AAACCTGTTT TTTTGCTTAT GCGTATACCA TAATATTTAC ATTATTTATT 

205261 TTTTTCTCAA ATTCAACCTA TACTGTGTCA AGTAATTTTT TTTAATATAA CATTTTCCTT 

205321 TAACTTAATT TCAATTCATT TTTCTGTGTC TACTTACAAC TTTGGCACTA GAATTCACAA 

2 05381 TTTTTTTTTA GAGGTATATC TCCTTAAAGG GAAGGGTTCT GACACTGTTA CATGTTCTCA 

205441 ATTGTTTGCA AATAGGTTAA TAATTATTCC AGTGTCTCTA AGTACATATC AACCATGCCA 

205501 GTGTTCAGCC TCCATAATTT TATTAGCTTC TGTGCTTATT TTGGAAAAAC ATTTCCCATT 

205561 ACCATGAAAG ACCTCAGTTT AGGATGGTTT GGTATGTTAG CCTGATTTCT GCATTCGTCT 

205621 CATGCAAAGG AAAATAGGAA ACGAAGAACT GAAATTACCT ATTGATACAA AATCAAAGTA 

205681 GCATTTGAAA CCATAAAACT TAAGTAGGGC TTTTCATCCT TTCTCGTTAG ACAGCAACAG 

205741 AGAATGGGAA GAAAAACTAA AGTGATGGGT TTGTGATACA ATTCCAGTAA CATAAAGAGC 

205801 AAGGAGAAGT AGTTTTGTTG TGTTTATGTT TAATATTCAA AGCTCAACCT AAAAGTATTT 

205861 TTCATTATCA AACTTCCTTC TAGAATAAAT GATTAAAACT TGATTTAAAA TATACAAATT 

205921 CTCCTTTATA ATACCTCAAA ATGGAGCTAC CCCATTGAGT TTTAAGCTTG TGATTAAAAT 

205981 ATTACGAAAA CAAAGGGGAA GTTGTAATAG GTAGAACAAG CAGTAGTCTA GGCATTAGGG 

2 06041 GATCTGGTGC TGGCTCTGTG CATCATGTGG TTTCAGGCAA CTTTTCAAAT TTTCTACGCA 

206101 AATTTTCTTA TCAATAAAAT AAAGAGTTGG GCCAGAGGAT CTCTGAGTCT CTTTCAGCTT 

206161 TCAGTGTTTA TAAGATTGGA GAAGTTGGTG GGAAAGCTTT AAGTGGAGTG TAAGTAATTG 

206221 CAGCTGCATG TACAGTTAAA GAGTTGCCTT CAGCCAAGCC ACGGGATCTT GCATAAAAAG 

206281 TGAAATCAAA TAGAAAATGG TGCAAACTCT GGGTTTGACC ACAGATGACT TCAGCTAGGA 

206341 TCTGAGTGTA GAGCAATGAG CTGAACTCCT GATATCCAGA TGTTAGCAAG ACTTGGAGGC 

206401 CTTCTAAGGC AGAGCAACAA CCAGTATCTG TCCTGGTGCT GACCTGATCT TACTAGCAAT 

206461 TGGGCCTCCA TTTGGGTCCA TTGTACAAAA CAACAACAAC AACAACAATA AAATCTCCAA 

206521 ACACCCAAAA TTCAAAATTT AGATGGAGAG ATACTATTCC CAGAATTCTA GAGATATTTG 

206581 GAAAGCAGAA AACTATACTT GCCATGCTGA TGAAGTCCAA TTATTGCTCT TTTAAATACA 

206641 TTTAGCTACT TCTGAATATA AAATGAGTAT CTACTAATTA TTTACAAAAT CACTTGGTAA 

206701 ATATAGAAAG TCACAAAGAA TGAAGTGATC ATCCTGTTTT GTAACCCAGA AATAGTCATT 

206761 ACTGGCACTT GTGTGAATCA GTTTCTATTC CTGTATGTGG ATGTGCACAG CGTATCCTGC 

206821 TTTGTACACT AGAGTACTAG CATTTTTCTA ATGTAATTCA ATATTGTCGA AAACATTTTA 

206881 AAATAGCTTC CATCACAATA ATCTATCAAA TTGACTTGCC AGACTCTCAT TATTAGGTTA 

2 06941 ATTTATCTCT AACATTATGC AGTCATGAGT AATACTACAA AGGATATTTT TGGACACAAT 

207001 TTTTCATCTA TGCCTTTCTT TATAATCCTT CATCCTAAGG TCACAGATTA TGAATATCTT 

207061 TAAAGTACGG ACAAGTCTTT TAAATTTTGT GTGCAAAAAC AGTGCAAAGC CTTGAATGAT 

207121 AAAATAGAGG TTTGATATAT GTGTTTTTTT GTTTGTTTGT TTTGAGACGG ATTCCTGCTC 

207181 TGTCCCCCAA GCTGTAGTGC AGTGGCACGA TCTTGGCTCA CTGCAACCTT TGCCTCTTGG 

207241 GTTCAAGCAA TTATCCTGCC TCAGCCTCCT TAGTAGCAGG GTCTACAGGC ATGTGCCACC 

207301 ACACCCGGCT GTTTTTGTAT TTTTAGTAGA GATGGGGTTT CACCATGTTG GCCAGGATGA 
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207361 TCTCGAACAC CTGACCTCAA GTGATCCACC CACCTCAGTC TCCCAAAGTG CTGGGATTAC 

207421 AGGTGTGAGC CACTGCACCC GGCCGATACA TGTGTTTTTA AAGTCACAGA AATTTCAGAT 

207481 GTCTTGAAGG ATTTTAAGCA ATTTAAAAAA TAAAGTCATA GAAGCTTCAA TTTAGGAATG 

207541 AATGGAAAAT TGATGATATT CTTAGGATAT GGATTTTTCC TAAAAGAAAC AAATGTATGC 

207601 ATCCCCAAAG ATAATTTGAT TAGTATACAA ATATTAAATT AAACATGTCC ATATTTAGAG 

207661 CCATGAATTC TCTTTGCCTG TCACAATAGC TGGATTTATT CACAATTGTA GTAATTAGTC 

207721 CCTGTTCATT ATAATTTTCT AGGTGATATG AAGACTTTGT CAGTCCAAGC AAGTGTCCAC 

207781 ATTGTGTGTA GCAAACATGA GAATAAACAT TTTAAACTTT TAAATGTAAT ACATATTAGT 

207841 GTTATGTAAT GTCATCCTTC ATGTTCGAAG GCACATGGAA CATTGTTCTG GTGGTACAGA 

207 901 GGGGAGAGAA ACACCATCAG AATGAAAGGA AAGACCGCTC TGGAACCTTC CTCCTTAGCT 
207961 CTTGAGCTTA GTTTAATTGT CCTGTCTTAT GGTCTGCTAC AAGCAATACC ACTCTTCACC 
208021 TTCGCATGCT TCTCTGTGGT TTGATAAAGT ACATGCAATT TTTCATTTAA TTCTTCCAGC 
208081 TGCACTAAGA AAGGAGCCTT ATCTTTATTG AACAGATGAG GAAATGAATG ATTAGAGAAT 
208141 TTAAATGACT AGCTCTAGGT CACACAGCTG GAACTTACAG CCAGATTTCC TTTTAACAAT 
208201 CCTGTAACCA AAAGCATACC AGTAGTGCCC CATAAAATGT AAGTTATAGA GCTGTGTTGG 
208261 GTCAAAACTT TTACTGATGC TAAGAGGAGG CAACATTAAC AAGGGGAAAT TATTTGTGTA 
208321 TTATGTTTTG GATTATGTTC TCTCCATAGA TAAAAGACTG TCGTAGTAAA AGAGATTCAG 

208 381 GGCACAGGGA AACTCCACCA CAAAGCGTGG TACCATTTCC CACAGAAGCT AAATGGACGG 
2 08441 GAAGCCTGCC ACCAGGAAAG GTAAAGCCAC TGCTCTTGTT TGCAGGCTAT GTTAATAAGC 
2 08501 TGAAGCTTAT TCCGACACAT TTACACATCT CTGCATCACA CTGACCCTTC GTAAAGATAC 
208561 _ TCCCAGTGTA ACATTGGAGC CAGCTCCAGC CCCTGATCCT GTTGCTTTTT CCTTAGCCCC 
208621 ATGAAATCAT CTGTGAGAAA TTAAGCCAAA TAAGCAATAA ATCCTGGGAT CTAGGGAGTG 
208681 GAATAAGTTT TGGGAAAGTC TTTTTTTTTT TTTT T TTTGA CTGAGTCTTG CTCTGTCTCA 
208741 CAGGCTGGAG TGCAGTGGTG CGATCTCGGC TCACTGCAAC CTCTGCCTCC CGGGTTCAAG 
208801 TGATTCTCCT GCCTCAGCCT CCCGAGTAGC TTGGACTACA GGCACACACC ACCATGCCCA 
2 08 861 GATGAATTTT TGTATTTTTA GTAGAGATGG AGTTTCGCCG TGTTAGCCAG GATGGTCTCG 
208921 ATCTCCTGAC CTCGTGATCC ACCGGCCTCG GCCTCCCAAA GTGCTGGGAT TACAGGCATG 
2 08981 GGCCACCACG CCTGGCCCGG GAAAGTCATT TTAAACCAAC CTATGTATGA ATCCCTACTA 
209041 TAATATTCTC ACCAAGCGGC TGGCTCTTTC TCCTGAGCTT GGAAACCTCC AGTAAAATGG 
209101 AAATAATTAT TTCCCAGACC ACCACTCTTA TCTGTGAGCT TTTTTGGCCA TTAAAAATTA 
2 09161 TTTCTTCCAT TATATTTTTA TCTGTGTCTT CACAGGTTTT CTCTTTCTTT CACTTTAGTG 
209221 CTTTTCTTCA AATAAGCAGG AAAAATCCAA TCTATCATGC ACATGGGAAC CCTTTCAATA 
2 09281 TTGGTCTGTG GTTGTTCCAT TTTATGGGGA TGCTTTTAAA GAAAAAATTT GTCCTTTCAA 
2 09341 TATATTGAAT ATCTTCCAGC ACCACATCAC CTGCAAGCTT TGTAAAAATA GTTCTACATA 
209401 TTAATTTTTT TTTTTTTTTT GAGATTGAGT CTCATTCTGT CACCCAGGCT GGAGTACAGT 
209461 GACATGATCT TGGCTCATTG CAACCTCTGC CTCCTGGGTT CAAGTGATTC TCCTGACTCA 
2 09521 GCCTCCCGAG TAGCTGGGAT TACAGGCATG CATCACCATG CCTGGGTAAT TTTTGTATTT 
209581 TTAGTAGAGA TGGGGTTTCA CCATGTTGAC CAGGCTGGTC TCAAACTCCT GACCTCAAGT 
209641 GATCCACCTG CCTTAGCCTC CCAAAATGCT GGGACTACAG GCGTGAGCCA CTGCACCCCA 
209701 CGTAGTTTTT TTTTTTTTTT AAGTTGAACA TATGTGAAGG CAGGACCTAG TGACACATAG 
2 09761 CAATAACATT TCCAAGTAGA CATTACACTA GGGAATTAGT CGAAGTGCTC ATTTAAAGTA 
209821 CCATCTCTCA AATGTATTAA AAGAGAATCC TTGGATGTGC AATACCTTAA TTCAAAGGCA 
20 98 81 GCTCGTTATG TATAAACTCT CAAGCTTTGT GATAAACAAA TGTGCATAAC AGATGGGACT 
209941 ATTCACTTAC AGCCCAGGGA ATTTTATTGA CGCTGAGAAG GTTATGTGAC TGGCTCTGCC 
210001 ACTGTCATCC CCATTCACTT CATTTTGGAG CAATATGACA TAAATGCCTT ACATGTGGGT 
210061 TTTCTCTATT TATCATGTGT TTCCTATCCC CTTGAAAGAT GGCCATATTT GCTTTACTTG 
210121 GTTATAAGAT CCCATATTCG CTGTCTTGAA GCCAACCAAA TAATTTGACA AAGTGGGTTT 
210181 GTAGTGCTGG CTATTTTGGT GAAAAAAAGA CAATGAGACT TCATGTGTCA TCCAAAGTTC 
210241 TATCAGATCG AGCTGTGAGA GAAAGGAAAA GAAAGGGGTC TCAGTCAGGA TGCTCACTAC 
210301 ATACATCTGT GTTGTTGTCT AGGTCCAGAT TTCTGTTCAT TACGCTATGG GCTGGCTCTT 
210361 ATCATGCACT TCTCAAACTT CACCATGATA ACGCAGCGTG TGAGTCTGAG CATTGCGATC 
210421 ATCGCCATGG TGAACACCAC TCAGCAGCAA GGTCTATCTA ATGCCTCCAC TGAGGGGCCT 
210481 GTTGCAGATG CCTTCAATAA CTCCAGCATA TCCATCAAGG AATTTGATAC AAAGGTAAGT 
210541 ATGATGGAAA ATAGGGCTCT TTGTTGAGAG AAAAAACTTT GAAAGGAAGG CATAGATCTT 
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213841 CAGAGCAGAC ATTCTCAATC ACTATGCTAG ACTGCCTTTC CATGGTATGT GATCCTACTC 

213 901 AGGCCTCTAC AGCTTTATCA TTGCTGTTCT CCCCAGCCTG TCGTGCTGAG AGTATATACT 

213961 CGAAGAGCAG AACTAAAATT CCATCCAGCT TCTCACTCCT AGGTCCACTA CACAGCTGCA 

214021 TCCTGCAGAC TTTTACCTCA AGCAACCCTC CTGCGTTCTT GCTTCCTTCC ATCATAGTTG 

214081 TAACCATCTC CTCTATTTGC AAATACTATC TGCTGATCTC TCTCTTCTAG ACTGGTTTCT 

214141 TTCAACCTTC TTCCCACCAA AACCAAGTTA GCTTGCTAAA ATAAAGATGG CACATTTTTA 

214201 CTCACCCGCT TGAGAATTTT CAATGTGTTC CTTCATGCTT ACAGAGTAAA GCCTGACCTC 

214261 TTTATTGCAT GAATACAAAA GTTCTTAGCC ATCTGGCCCC AACCTTGTTC CACTCAACTC 

214321 CCCTGTGCAA GCATGGCTCC AGTGGCACTG GACATTGGCT GCTCTCCACA TAGATCTGCA 

214381 CTGCACTTCC CTCTGGCTCT GCTCCCGTTA GTTTATATGC CTGGAAAGTT CTTTGCCCCT 

214441 GTTCCTTGTG CCAAAATTCC ATCTATCCTA TTGCATAGCT TATGTAAAAA CTTCCTAAAC 

2 14 SOI CTTTTTTTTT TTTTTTTTTT TTTTTTTTTG AGACGGTGTC TCACTCTTTC GCCCAGGCCG 

214561 GACTGCAGTA GCGCTATCTC GGCTCACTGC AAGCTCCGCC TCCCGGGTTC ACGCCATTTT 

214621 CCTGCCTCAG CCTCCCGAGT AGCTGGGACT ACAGGCGCCT GCCACCATGA CCGGCTAATT 

214681 TTTTGTATTT TTAGTAGAGA CGGGGTTTCA AGCCAGGATG GTCTCAATCT CCTGACCTCG 

214741 TGATCCGCCC GCCTCGGCCT CCCAAAGTGC TGGGATTACA GGCGTGAGCC ACCGCGCCCG 

2148 01 GCCAAAACTT CCTAAATCTT ATAATTATTA TCAATTTATC CTCAGATATA CTTCCACGTA 

214861 CATTGTAGTT TTATTATATT TATATTTTAC ATCTTTTTTT TCAAATTTCA GTTTGGGACC 

214921 CATTAGTGAG TCATAAAATC CATTGAGCGG GTTAAAATCA TTATTTTAAA AAATGAATAG 

214981 AATAGAATAG AAATTGTTGG AGTGCATTGG ACATGGTAAA GTTAAATATC GATTCATGAA 

215041 ACCATCGTTT GAGGCATATG TGTGTGGTTG TATGTACAAG TGTTTATGCA TATTGGTGTG 

215101 TGTGTTATGT TACCCTGTAA AATGCATTTC TTACTATAGG TCTCTGTGAA ATATGTGTCT 

215161 TGTTGTTTTT TAATGTAGAC TTCCAAAGCC TACATGGCAT TTCACTAGTG ACAATCAATT 

215221 TTATTCACAT TTTTCTCTCC AATTGGACCA GAAGCTCTTT GAGGGCAGGG GCTGTATCTT 

215281 ACCGATTTTT GTAAGTCTTT CATTTCCTGC CCCTAGCCTC ATATTAGATC ATGCAAGAAT 

215341 t GCAACTGTAA TCACAAGAAA ATGCTAATGG GCTGTGATAG CAGAGAGTTA CTGTGACAAA 

215401 CTAAGGGATT TAGATTTGGT CACATTGGTG TTGAGGAGCC ATTGAAGAAT CAGAGAGTGT 

215461 GTTACTATTA TTTGTTAATT TTAATTATAT CATATTACTT TACTGGGGAA AATCTGTGAG 

215521 CTATTTTAGA AATAAATACT CTCATTGCCC AATAATTCTA AGTCTGCCAC CTCACTGTTG 

215581 GGACATTGTT TAGGGAGGCC ACGAAGTCTC AGCCTTTGAT ATTTTCATAA GTGTTTTTCT 

215641 CCCTTTTTCC TTTAGGGTCA GCATTTGGAT CCTTCATCAT CCTCTGTGTG GGGGGACTAA 

215701 TCTCACAGGC CTTGAGCTGG CCTTTTATCT TCTACATCTT TGGTGAGTCA CTTTCTCTTA 

215761 AATCCTAACG CCTCCATTTC CTGAGCATCC ATTTTGGCAC CTACACCACC CACATTCTTC 

215821 CTATATGAAA GAAAATGTCC TTTATCAAAT GGAAGATGAT AAAAAATGTC AACGGTTGGT 

215881 ATCATTTTTA ATCTAGTCAC ACAACCTGAT TAACACCTTC CTGGTGGTTC TGGGAAGCCA 

215941 CACGCACAAG GTAGAGGAGT TGACTATTCA CATGGCACCC ACCGACTTGT GATGCAGTCT 

216001 TGTCCTTCCA TATCAAGCAC CTTCTGCAGA ATCTCTACCA CCACATCTGA AGTGCCTGCT 

216061 ATATGCAGTT AAGATGTCAA AGATAGTGAA GTACATTTTC AATGTGTCTT CATATTTCAT 

216121 TATAATTATT ATTTCTGTCC AAGATGCCTT TCACCTGTTC TCTACCAAGT TAATCTTGCA 

216181 AAGTTCAATT CAAATGTTCC CTTCCCCATG GGCCCTTCCA GGGCTTACCC TATCAGATTC 

216241 TGGCATTCTC TCCTTTATGA TATTTCCTCT CTAGGTTATG TTGGTGTGTA ATTATTTATT 

216301 TCTCCTTTTC TTTCCACTAG ACTGTGAAAT GCTTGAGGCA AGGAATCCAT TCTATGTTTT 

216361 CATCACTTGG GTGTCATCAT GGTGCCTGAT TTTTAGCTTT AAAATAAAAG AATCAGTGAA 

216421 TCCAGTAATT AGAGGGGATT TAAAGAAAAC TAGTCCTCAG AATCTTTTAA CATAGAATGT 

216481 TCTTCAAATA AGGAATTCCA ATAATAAGAC AATTTTCTAC ACTTGATTTT GTTTTTATAG 

216541 CCAAATGGTG TCATTAAATA TAGTCCTGGC CTGAATGGCT TTCTCATTAA TGATGCTAAT 

216601 TATTTTGGTT TGTACATGTT AACCAGGTAT TGTACAAAAA TATTTCTTTT GGGAATCCAT ^ 

216661 AATGGATGTA TGGCTTGAAT ACAAATAATA CTGTCTCTTG TAAGTGCATT GGAAATTTTT 

216721 CCCTGCCACA TGATTTCATG GAAGGTTGTT TCGTGTATGT ATGACTGCAA ACCTGACTAT 

216781 TCAGATCTTC CGCAACAAGA CAACTTATGT GTGCATTAAG AAGTTGCTGC CTAAAATACA 

216841 TAACACTGTA ATCATTGGAG ACTTTAAAGT AATTAATCAG CTATGCAATG CCACGCTCCT 

216901 GTTATCTCCA GAGGGCTCTG ACATTGACAA ATGGTGGCTT TCTATTTGAG ACGTAATATC 

216961 TAAAAAGCTT TAACAGGTTT GTAGAAGGAT TGAAAGAAAG AATGGGAACA TTTAGGTCCT 

217021 TATGGTAGAA TAAGCATTAA TTGATTAGTG TGTAGAAGGG AGAGGCATGC CACTTCAGAG 
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217081 GAAACTTCCT TCCCCCAGTA AACAAATCTA CCTAAAAACT AATTTTATCC CTTCTTCCCA 

217141 GGTAGCACTG GCTGTGTCTG CTGTCTCCTA TGGTTCACAG TGATTTATGA TGACCCCATG 

217201 CATCACCCGT GCATAAGTGT TAGGGAAAAG GAGCACATCC TGTCCTCACT GGCTCAACAG 

217261 GTACAGTGCA CACCTTGTAC CTGTGGCCCA TGCAGAGGTC TCTAGGGCAG GGTGTGGATC 

217321 TCCTCTGAGA GGCACCATCT TGGCTGCTCT AATACTCATG CTGATTAGAT CTTTCTTTTC 

217381 AGCCCAGTTC TCCTGGACGA GCTGTCCCCA TAAAGGCGAT GGTCACATGC CTACCACTTT 

217441 GGGCCATTTT CCTGGGTTTT TTCAGCCATT TCTGGTTGTG CACCATCATC CTAACATACC 

217501 TACCAACGTA TATCAGTACT CTGCTCCATG TTAACATCAG AGATGTGAGT TTACTTCCTA 

217561 TACTTCTACG AAAATGATAA TGGTAATAAG GAGAAACAGT TCTGTGTTAC CTATTACATT 

217621 CTGGCTTTAC ATATAACCAT TAATTTAACC TTCACAATGA CCTTGAGAGA GGCATTGTTA 

217681 TAATTCCCTT TTCACAGATG TGGAAACAGG ACACTTAGAG GTGAGATAAC TTGCCCCAGG 

217 741 TTGCACAATA CTAAGTGATA GAGCTGCTGC AGCATCCATA TTCTTAACCA CTATGCTATA 
217801 CTACCACACC AGCTGATTCC AAAGCTTCTT TTAGAAATAA TATTGCTGGG CCAGGCATGG 
217861 TGGCTCATGC CTGTAATTCC AGCACTTTGG GAGGCCGAGG CAGGCAGATC ATGAGGTCAG 
217921 GAATGCAAGA CCAGCCTGAC CAATATGGTT TACTAAATAT CATCTACTAA AAATACAAAA 
217981 ATTAGCCAGG TGTGGTGGCA GGCACCTGTA ATCCCAGCTA TTCAGGAGGC TGAGACAGGA 

218 041 GAATCGCTTG AACCCAGGAG GTGGAGGTTG CATTGAGCCA AGATCATGCC ACTGCACTCC 
218101 AGCCTGGGCG ACAGAGTAAG ACTCCGTTTC AAAAACAAAA AACCCAAGAA ATTAATATTG 
218161 CTTTTATCTG GAGCCCAGAG TGATGCAGCT TCTGGCCCTC TTATCTGAGA CAGTGTTCTT 
218221 TTAGTGTGAA AAAGGATGCT AATTTTCCCC CAAACAACCC ACAGTATCAT GGGGGTAAGT 
218281 TAATGGCTGG TCTGTGTAAC TGACAAATTT TGGTGCTAAC GTATCTCTAT AACTACTCTG 
218341 TATAAACTTC CTTCCTTCAG AGTGGAGTTC TGTCCTCCCT GCCTTTTATT GCTGCTGCAA 
218401 GCTGTACAAT TTTAGGAGGT CAGCTGGCAG ATTTCCTTTT GTCCAGGAAT CTTCTCAGAT 
218461 TGATCACTGT GCGAAAGCTC TTTTCATCTC TTGGTAAGGA TAAGCGTGTG GGCCCATTTA 
218 521 ACCAATCCCT TTTCTGCACA TGGTCTCAGA GGGTTCCCTG ACAGCATGTC CTCATTGCCC 
218581 AGGGCTCCTC CTTCCATCAA TATGTGCTGT GGCCCTGCCC TTTGTGGCCT CCAGTTACGT 
218641 GATAACCATT ATTTTGCTGA TACTTATTCC TGGGACCAGT AACCTATGTG ACTCAGGGTT 
21B701 TATCATCAAC ACCTTAGATA TCGCCCCCAG GTAAGAGCTC TACCTGTTTT TTCCCCTCCT 
218761 CCAGACCCCT CCAGAGGTGT TAGACCTCAG TGGTCGCCGT GAAACTCTTT AATGTTACTG 
218821 ACATTGCACT AATGGCAGAA TGACAAATAA CTACAAATAT CTGTCTGTGG CCATTTTTAG 
218881 AACAACAAAT GTGGCATTTT TAGAACAACA ATTTCCAATC TTGGCCAGTA AT CATTTTGA 
218 941 CAAAAACCTT CCCAAGCTTC CCTAACAGAG ATTGAACTGT GTATGCTGGG AAAAGGCCCA 
219001 CACACAGGTG ATTTGGAAAA GTTTCCATGG TGTTGTTCAT ATTAGCTACC ATATATATAT 
219061 ATATATATAT ATATATATAT ATACAGTCAC AATAAGCCAG CTCCTGTGCC AAGACTTGCC 
219121 ATATATCAAC ACATCTAATC CTCACAGTTA TATTAGGTAG GCCCTATTGT TATCCCCATT 
219181 TTATAAGGGA GAAGGCTGAG GCACAAGGAG GTTAAATGGT GTGACTATGG TCACATAAAG 
219241 GCAGAGCCAG GATTTGGACT GGGGGAGTCT GGCTTTGGAG TCTGTGTCCT GCCCGTTGCA 
2193 01 CAAACTGGCT TCTCCACTGA GCAGCCGGGG TAAAGAAACG TGGTTCCCAG AGAGACTGCA 
219361 TTGCTCCCTG GTTATTGACT TGGTAGATTG GTAATTTCAG GTTTGGCAAA TAGACATTGC 
219421 CCTGAATGTC TTTAGGTGAA TGAAAAACTG CATTAAGCAA AATGACTTTG CCATTAGAGC 
219481 TGAATTGCAT TAAAGTTGAG TTGCTGCAGA AGCTGTAGGT GGCTTTCTAT ATAAAATCAT 
219541 TTATAAAATC ATCTTCCCAC AGATATGCAA GTTTCCTCAT GGGAATCTCA AGGGGATTTG 
219601 GGCTCATCGC AGGAATCATC TCTTCCACTG CCACTGGATT CCTCATCAGT CAGGTTGGGC 
219661 CAGTTTATTG AACATCTTCA AGTGGCAGGT ATTGTTTTAG GTGTTGGAGA TACACACGGT 
219721 GCTCTAAAGA TCTGGATGGC AACACAATTA CTCTATTTAC ATGAGCCTCT AAATCAGACT 
219781 CTGGTAGGTC AGATTTCCCA GAGGAAGAAA AATATAAGCT TATTTTCTCA AGATGAATAG 
219841 ATGTTAGATT GATTAAAATG AGCTGTTCCG GTGCAGAAGA CAGCACGTGT GACTTCCTAG 
219901 AGGTACATGA GCATGAAACA GTTCTTAGTT ATGACCAGAA TGAAAGACAC ATGTCAAGGA 
219961 ATAGCAAGAG ACGAAGACAG AGGGGCAAAA GAAGATCATG AAGAATATGT TCAGACTAAT 
220021 CCAATTTTTA AAAAATCACA AAAGGGAAAC AAAGTGTCCT AGGCCAGTTT AAAGATAATT 
220081 TAATGTCTGG AAACAGATCG GCTGTGAGAC ATTGCAAGGA GGCTTGCTCG GTGTTTGGAA 
220141 ATGCAGGCTC ATGAGGAAGA TGAAAAGACA GACCCAGGCA GGGATGGAAG GACTGACGAG 
2 2 02 01 AACCAACTTA CAAAGAGAAG TTTTGTTTTT ACTACATTTC TATGTGATCA AGTTCCCAGG 
220261 TTAATATTTG ACTAAACTGC TAGGAATCCA CTGTGACTAT AATGCTGGAA ATGACTTAGT 
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220321 AGGGCTTTCT GAGGAGGGTC ACACAGAAGA CCAAAGAGAA CTCATGTTGA ATTGAGATGG 

22 0381 GTTGTAGTGA TAGTTGTCAA CAGCCAATAC AGAAACAAAA AAAAACAAAA CAAACAGCAA 

22 0441 CAACAACAAC AAAAAAAAAC AGAGAAGACA CAAACACAAT GCCACAATGC CATTTTAGGC 

22 0501 ATAATTTTAA ATGAGTAATA TTATATGTTG AAATCCAAAT TTTCAGAAAA ACATTAGTGT 

22 0561 ATTTTATTTT TGTTTAAAGA AATAACCATC TCAACTCAGA ACCCCATGTG CATTTTGGCC 

220621 ATTTTGTTTC CAATAGTTTC ATAAACTTTC TTAAGTAACT ACTGCACATT GTTCCTTATA 

220681 TTCCTTGTGA TCAACATTGC AATACACAAC TGGGAGGGCT ACTAGAACTG GTGTAGAAGG 

22 0741 AACTTGTGAG ATTGATCATT TTCTCTGTTT TTTACATCTA GGATTTTGAG TCTGGTTGGA 

22 0801 GGAATGTCTT TTTCCTGTCT GCTGCAGTCA ACATGTTTGG CCTGGTCTTT TACCTCACGT 

220861 TTGGACAAGC AGAACTTCAA GACTGGGCCA AAGAGAGGAC CCTTACCCGC CTCTGAGGAC 

220921 ATAAAGTTAC AAACTTAAAT GTGGTACTGA GCATGAACTT TTTAAACATT TTTTACTTCT 

22 0981 CTCCATATTC CTGACCATAG ACTCAGCAGT TCTTAACTCT GGCTGTGTGT TAGTCTTCCC 

221041 TGGGGAGCCT TTATAAGACA CTGATACTTG GGACCCACTC CAGAGATTCT GAATGAATTG 

221101 GTCTGGGGTG GAACCCAGAT ACTACTAATT TTTAGATACT CCTTAGAGGT TTCTAGCATG 

221161 CGCCCGGGGT TGACAACAGC TGGACAAACT TGAAAAGTCA ATTCATGTGG CCTTTGAATT 

221221 TTCCTCATTG GAAAGTACTA AATAAATAAA AATTCATGTG AAAATGATCA CTGATAAATA 

221281 TCTTCATGGT GGGGCAGGTT ATTGGATGCA GAGAAGATCT GCTCGGAATT GTAGCCATAT 

221341 GTTACAGATC TCAGCACCGA TCGGAACTGT AAAGCTATAA TCCCCAGAAT TAAAGTTTTT 

2214 01 ATTATTTTTT ATACATTGTA AAACATAGAC GTTTATTTAT GTGATTAAAT TCTATTAAAA 

221461 TTTACATGCT AAAATAAAAT AGACCATTTT CAAATTATTT AGATCCAGAT ATTTCCATCA 

221521 GATTAAACAG ATATTTATTT ATCCTAGCCC AATTGCAAGA GATTAATGAT GAGAAAATGA 

221581 CCAATACAAG ATTAAATAAA TGAGGTTAAC TTAGAAATCA AGGACAGAGA AGATAGAACT 

221641 GGAAGGCTTG TATTGTGAGA AGAATGAATG TGAAGGAAGG CAATGTAGAC ACTTCCAGAA 

221701 GGGATAGCAA TATAGTTTAG ACCATATAAT GAAAATTGGA GAGAGATGAC AGAGACACTT 

221761 TCAAGTGAAA TGACAATTTA TATGGGGGAG AAAAATATTG AAGACATAAC AAGATGAGAA 

221821 AAGGCATAGA AATGTATCAC ATACAAGGCA TAGAAGTGTA TCACATACAA GAGAAGTTCC 

221881 TTTTGAGCGT AGAAAAAGAT AATTTAACCT TCTTCATATT TTTCTTACTT TCCCAAGATA 

221941 CTCAGATAGG CAGCGTCAAC TCTAACAGGA ATTAATTTGG CTCCTAACAC TTAAGACATA 

222001 TCCTTTAGTT TGTCTCCTCA CACAGAACTG ATTCTGGTTT TGCCACAACA TGTCTAGAGA 

222061 AGAAGTTCCC ACCATATTTT AAATCCTATT AAAAAACTGC TTGGACAAGA ACCTTGGGTT 

222121 AATTCAGCAG ATGAAGAGAA TCTCCTAATG CAAATCAATG GGTATTTTTG AGCAAGTTTT 

222181 TCAGAAAAAC AGAGTGTCAG GCCCTGAGGG TGGTACTAAG ATGAGAACAT TGATTTTGCC 

222241 TTCATGATAT TGACAACACA AAGAGGAAAG GGGGTTTGCA GAAAACTAAA AGAAGAAGTA 

2223 01 GAAGAAAAAA GAAAGACATA GTATAATAGG TAGTCAAATT ATGTACAGAA AAAAGAGAAA 

222361 AAAAAAACAA AAAAGGGTGG GGGACAGACA ACCCAACTAA AAAATGGGCC AATGACTTGA 

222421 ACAGGGACTT CATAAAAGAG AAAATGTAAG TGGCTCCTTA ACATATAAAA AGATGTTCAA 

222481 CTTCATTAGT CATTACAGAA ATGAAAATCA AAACTACAAT GAAATACCAC TATAAAATTA 

222 541 ACTAATGGAT AAAATGAAAG GAGATGGAAA ACAAAATGTT GCCAGACATG TGGAGCAACT 

222601 GGAACTTTCA TACGTTACGA ATGTGAACTT TGGAAAGCTG CTCGGCAATA TCTCCTAAAG 

222661 CTAAATGTAC AATTCCAGTG ACTCAAACAT TTTACTTAGA AATGCACATA TACATCCATA 

222721 AAACATGTAC AACAATGTTC ATAGGAGCAC TATCTGTAAT AGCCTGAACA GGAAGTTGTC 

222781 TGTTAAAAAA AGAATGAGTA AATAAACCAC GGTCTATTTG TATAGCAATG AGAATTAACA 

222841 GACCCCAATA TATAATAGAT GAATGGGTCT CATAAGCACA ATATTGATTA AAGGAAGACA 

222901 AAACGCACAT TCTTTTAAAG GTTTATAAAA TACTTTTTAA AAACAGCTAC AACCAATCTG 

222961 TCCTGTTAAA AATCAGTGAG CGATTTCCCT TGTGCAGGGA TGGGGGTTGT GGCTGGATGG 

223021 ATGGTACTTA AGAAGTGCTC CTGGGGTACT AGAAATATTT TATTTCTTGA CTTGGATGTG 

223081 TGTTTACTTT GTGAATATTG TACATTTATG ATTTGTGCAC GTTTATGAAT GTAGAAAATA 

223141 AAACAGAAAG CAAATTCAAA GTATCATCCT TTTGAGAGCT TCTGCTCTGA CTTCGTTTTG 

223201 ACCAATGGAG CAGTTGGGAA GGGGTCTTGG TCCTTCGGTC CTTTGCTTTT TTTTTTTTTT 

223261 TTTTTTTTTT TAGACAGAGT CTTACTCTGT CGCCCGGGCT GGAGTGCAGT GGCTCGATCT 

223321 TAGCTCACTG AAAGCTTTGC CTCCCGGGTT CATGCCATTC TCCTGCCTCA GCCTCCCCAG 

223381 TAGCTGGGAC TACAGGCACC TGCCACCATG CCCGGCTAAT TTTTTGTATT TTTTAGTAGA 

223441 GACGGGGTTT CACCATGTTA GCCAGGATGG TCTCGATCTC CTGACCTCGT GATCCGCCCA 

223501 CCTGAGCCTC CCAAAGTGCT GGGATTACAG GTGTGAGCCA CCGCGCCCGG CCCCTGGTCC 
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TCTGCTTTCA 
CAGAGCAGGA 
GTTCTCAAAC 
CAGTTGCTTC 
CTAGGTTTTC 
CAGATGACTT 
GAGTTACCAT 
CATTTCTAAT 
AATTAATTCA 
TTTTCTCCTT 
TCCCTTGAAT 
TTTGGCTGAT 
TTTTTTGTGT 
TTTTTCCCCA 
ATTTGAAAGC 
AGACTTTGAT 
TTCTTTGATC 
TTAAAGTATA 
ATTTTATCTA 
TTCTTTCTCC 
ATTTATTTAC 
GTGGCGCGAT 
CAGACTCCCG 
TTTTAATAGA 
ATGATCTACC 
AGCCCTGCTT 
CTATTTCCCT 
TTAATTATGA 
GCAGATTACA 
TCAAGTTAAT 
TTTTTTTTTT 
CTTCAAGGGA 
ATGCCTAGCC 
ATCATCTCAC 
TAATGTATTA 
TGGGTATTAT 
CAAGTGAGGT 
ACAGAATTGG 
CACTTAGCAT 
TATCCTTCTT 
GCCTCTGAGT 
AGCCTTAAAA 
CTTATTCAGA 
TAGTTAGCCA 
TTTATCCTGG 
TGGCTGGTGG 
TTTTCTTTAC 
TTCTGGTTTC 
TGGTAGTGGA 
TATTTCCTAA 
AAAGGAAAGG 
TGTCAACAAA 
TTCACCTTGG 
TAATCACTGA 



TGTTCTTCTT 
AGGAAGGCAA 
AGCAAAATTA 
CAATTGCATC 
CCAGCAGCTT 
CGGTGTGTCA 
TCACATTCCT 
TCATCTCCCC 
CTTTTCTCAA 
GACTGTTAAA 
ATTCTTTTGA 
GTACTGATAT 
CTGGATATGG 
TTACTCTGAA 
TTGAAAGCAT 
AACTTTCTCA 
AATTTTGACC 
TTTGCAAAAA 
TATCTGAGGT 
TTCATTAGAC 
TTATTTATTT 
CTCGGCTCAC 
AGTAGCTGGG 
GATGGGGTTT 
CACCTTGGCC 
GTCTTTTTAT 
TTGCTTTACT 
AACAGGTTAA 
TTTTGCTGTG 
AACCTATATA 
GTAGATACAG 
TCCTCCTGCC 
ATGTCTCTCT 
TCTTGGTTTC 
ATTTTGCATT 
ACTTTTCACT 
GCCCAGGAAG 
CACATGAGAG 
ACCCCTGGAC 
CATCTCAAAA 
CCCACAGTAG 
CATTGTAATA 
ACAGTATTGA 
GCTACTTTTT 
AATTCCTTCA 
TCTTAGAGTT 
TAAGAATCTC 
TGCTGACTTT 
GGCAGGCAAA 
AATTGCCTCA 
CATCCACACT 
GGAGTACTTC 
CTCTTGGTTT 
GAATATGCAC 



GGTCCTGTTC 
TGGGTCAATC 
ATGAGCTCAG 
AGTTGCCACG 
CTCTGAGGTT 
GACTTTCAGG 
AATGGCTTCA 
TCCCCATCCC 
ATAGTTTATT 
TATTATGAAT 
TGTACGACAG 
ATGAGATTGG 
AGCTTATGCT 
AAAGATTGAC 
TGGTTTGTAA 
ATTTCCTTCA 
ATTTATGTTA 
TTCAACTGTT 
TTTAGCTTCT 
TACTTAGTCA 
TTGAGACGGA 
TGCAACCTCC 
ATTACAGTCA 
TGCTATGTTG 
TCCCAAAGTG 
TTTATATTTG 
TCATATAAAT 
AGCTTAGAGG 
TTGTGCTCCC 
GTAAAAAAGT 
GGATCTTGCT 
TTGGTCTCAC 
CCTTATATAT 
ACTACTGTTC 
GAGTAGTTTC 
GTTATTTGAA 
CAATATTTAA 
TGAGTGCCTC 
AATGAAGTGT 
CATTTCAATG 
CTGAGAATTT 
TTAACTTAGC 
CTTCCTGCTA 
GTAGGAGAGC 
CCAAGATGTG 
TCCTTCGATT 
TCTTCTATTT 
CATTTTTGGA 
CACTTTCCAA 
GAATGTGCCT 
TTATTTAGGT 
CAAATATTGG 
GCCTGCTCCC 
AGTATTGTAT 



CTCCTCCTCT 
GATGCTGTCA 
GCTTTGAAGA 
GGTGATAAGA 
TTCCCAGCAG 
GTATCTTTCC 
GAATAGATGC 
TAAAGGATTG 
GTCATCTACC 
TATATTAATG 
AATTTGATTC 
CTCTGTATGC 
GATTTCAAAA 
TAGAATGGAA 
AAATCATGCA 
GTTACTGGTC 
TCTTGGAGGA 
TTATCAGGCT 
TTGTACTTCT 
TTTACTAATT 
GTCTCACTCT 
GCCTCCCGGG 
TGCACCACCA 
GCCAAGCTGG 
CTGGGATTAC 
ATTAGCTTTA 
TTTGTTTTGG 
AAAATTGCTC 
AAATTCATTG 
GGCTGTTGAC 
GTGTTGCTCA 
AAAATGCTGG 
AATAAGAAAA 
TCTGGAAGTT 
CATAGAAGAA 
CATAATTTGA 
GGAGGCATCC 
CTTAATTTTG 
TTTTTGTTTT 
GAGTATTTTT 
ATTTCATAGT 
TGGGAACAGA 
GTCTCTTCTG 
TATGTTTAGG 
CCAAGGTGTT 
TTGTTTTATT 
ATCTGTATGG 
CCTTTTACTT 
AGTCTTTCTC 
ATGTCCACAA 
GCAATGCCTG 
TTTGGGGATA 
TCTTCTTTTA 
GTTTTATTAT 



TTTGTTGGAA 
GCTTTTGGAT 
AACCATGACC 
ACAATGATGA 
CTTCTCTGAT 
TTATGTGATG 
AATTGTGAAC 
TTTCTAACAA 
TAATGATGAG 
TATTTCTTAA 
ACTAATAGTT 
ATACATGTGT 
ACAAGAAAGG 
TTTTTATAAT 
GGCTGAAAGC 
TTTTAAGGGG 
TCATCTATTT 
ATCTTTTTAA 
GACCCAATTG 
TTAAGAATAG 
GTCACCCAGG 
TTCAAGTGAT 
TGTCTGGCTA 
TCTCAAACTC 
AGGCATGAGC 
TCTTTTATCA 
ATAGTTTATT 
CTCTAAGTCC 
TTCTTTTAAT 
TCTCAGCTTT 
GGCTGGTCTG 
GATGACAGAC 
CAGACACACT 
TTGCTCTGAC 
TTATAGCATT 
GGGCTGAAAC 
TTTCTTAGGC 
AGTGCTGGAC 
GTTTTTTCAT 
TTGGAGCAGT 
ACTCTTTATG 
AATTTTGTTC 
ATGTCCAATA 
CTAGGTGCTA 
AATCATTTTC 
TAGTGATTGT 
TAAAACCTTG 
TGCTTTCTCC 
AATTTCCATC 
TATCCCTCCT 
AAGTGTAAAC 
ACCTGCTAAT 
TCTGCTGTGT 
AAGAGAGGAC 



CTTCCAGTAT 
CAAACTGCAA 
CTGAAAGCAT 
CTCAGAATGC 
TGATTCCTGA 
GTTTGAGGAA 
TGATAGGAAA 
TAGTCATGAA 
ATGACTTACT 
TGTTGAGCTT 
TATTTAGGAC 
TTTGTGTATC 
AGAACTTTCC 
TGCTGTTGTT 
CATTTTGAGG 
TTTTATATTT 
TACACACTAT 
TAATATATTC 
CATGTGTGCT 
CTTGTCTTTT 
CTGGAGTGCA 
TCTCCTGCCT 
ATTTCTGTAT 
CTGACCTTAG 
CACTGCGCCC 
AGCTTATGTC 
TATTTTTCAT 
AATTTTGTGG 
GCTTTATTTC 
TTTTTTTTTT 
AAACTGCTGG 
ATGAGACACC 
GAGGCATCCT 
CTTTTGCAGT 
TGCATTCTGT 
CAAGATGAGG 
TCATGCAAGA 
ACTTCTTGCT 
GTCCATCCTT 
ACTTGGATGA 
ATCACTGTGG 
CACAATTTGT 
TGAGGAAGTC 
TAGGATTCTC 
TCTTGCTTTT 
CCTCAATTTG 
TTGCCCATCT 
ATGGACTTTT 
AATTTCAACT 
TCCACTTTAG 
ACTTTCTGGT 
GATTAACACA 
GTATTTTTTT 
TGGCCAGAGT 
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GGGAATGTTC 
TGAAGCTGGC 
GAATGATACA 
ATTCTAATTA 
TATTTACAAA 
AAGGAATGCC 
ACCATTTAGC 
ATAATCCTTA 
GCAGTGAGCC 
CAAAAAAAAA 
CGCCTTAACA 
TCAGCCCATT 
TTTTTTTTTT 
CAATCTTGGC 
TCCAAGTAGC 
TAGAGACGGG 
CACAATCCTT 
ATATATGTTC 
TCTTGAGAAA 
TTTCATATTG 
TAAAGAAACC 
GGGGTTACTC 
CTGTAGCAAC 
CAACATTTAA 
TTTGCTTTAA 
CAAAGGGAGT 
GATAGTTCCA 
AGTTTATATA 
AAGCCACCCA 
CTCCAAGTGC 
ATGATCAATT 
TCCCTGCAGG 
AAATCACTTT 
TTTTAATATG 
TCAGTGGTCC 
AAATTGGTGC 
AATCAGTGGA 
GCCAATCTGG 
TGACTGTTTA 
TAAACGGGCC 
TCCAAATGGT 
TCCTCTTTTG 
GGTAATTTTT 
GATGCTCTCC 
ATGGGATTAT 
CTGGGTGAGA 
TGGATAGCCT 
TATGCCAGGC 
GCCCATGTGA 
AATGCTAAAA 
AAAGCCCATG 
AATGACTTTG 
TCTTAGTGTG 
TTATCCAAAC 



TGAATTCAGA 
ATATTTTCCC 
ATAAAGTGGT 
TAGTCACTCT 
ACAATTTATT 
TAAAGTTTTC 
TATCCAAATT 
AAAATTTGCC 
AACACAGTGC 
AAAAAAAAAA 
TTATTTGTTC 
GTCATATTTT 
TTTTTGGAGA 
TCACTGCAAC 
TGGGATTACA 
GTTTCACCAT 
GGCCTCCCAA 
ATTTTGAGTC 
ATCTCTGAAA 
AGAATTGTTT 
ACCTGTGTGT 
TGAGAATCAA 
TTGCTCCATT 
GGTCTCAGAA 
CAAACCCTAG 
TCAGGACACC 
CCAAATAAAG 
AAATTTATTT 
TTTGCCAAAA 
CACTAACAAT 
ATTTCATTTA 
GAACTGCAGT 
CAGGGTGGTC 
TGACTCCTCA 
AGCGCTTATG 
CATGGACATA 
CAGCATCATT 
CACCATGAGC 
GCCATTTTAG 
TTTGCCCTCT 
GGCCTGAATA 
CAGATACCAT 
AATAGAAATG 
ATGTCCTTCC 
TCCATTTTGT 
TGCTATAGGT 
AAGTGGTGAC 
ACCACTCTAG 
AAGAGAATAA 
AGAAAAATTA 
TATATATGTT 
AGAAGTTACT 
TATTCAGTGT 
TTAAGCCTTG 



ATAACTGAAG 
AGAGCACCAA 
TAGAACTTTT 
TCATCTTATT 
TTTTGATGAA 
AAAATTCTTT 
GTTTATTTTT 
TTAGCACAGG 
CACTGCCCTC 
AAAAAAAAAG 
ATTAAAAACT 
GATTTTTATC 
TGCAGTCTCC 
CTCTGCCTCC 
GGCACCCACT 
GTTGGCCAGG 
AGTGCTATGA 
CTTTAACAAA 
AGATGCCAAT 
TTTAAAAAGT 
TGGTTAAGCC 
AGGAAAACCT 
GTTGAAATAA 
GATAATATAA 
AGAGCTGGTA 
ATGATTCACG 
TTGAAATGCT 
TTTCCTTTTT 
TAAAGTGAGA 
TCTTAGGACC 
AATGGCTCTA 
GGCTTTTATC 
ATGTAGTTGC 
GATTCAGAAA 
AACCCACATC 
AGAGGAAGGC 
ATTTACAACT 
TCTAATTTTT 
AGTGTGGCAT 
CTTATGAACA 
CTATTTACAA 
CATTATTCAT 
TAATAATTGC 
AAAAAAAGGT 
TCTTTGTTAA 
ACAATGACAA 
TTTTACCTCC 
GTGCTAGGGA 
GACAATAAAT 
AGCAGGCAAG 
CTATTGGTTT 
GGCTTTTGAT 
TTTAAGAGAG 
CTTTAGGTAA 



CAGTACAGGA 
ATTTCAATAT 
ATTAAAATAA 
TCATCTTATA 
AAGTTTTAGA 
TACATGTTGT 
AAGCAGTATC 
AGAATTGCTT 
CAGCCTCGGC 
GCCAAAAACA 
TTCTTTAATA 
ACTTGCTTTG 
CTCTGTTGCC 
TGGGTTCAAG 
ACCACGCCTG 
CTGGTCTCGA 
TTACAAGCAT 
GTCATAAGAA 
AATTTGTAGC 
TTGTATGTGT 
ATAAGTACAT 
GAAGAAACAG 
ATAGGCTTGA 
TTGGTGAAAT 
GGCAGAGCCT 
ACCACAATAC 
GACAAGAAGG 
TATTGTTATG 
ATCGTTTCTT 
TGAGCTATAA 
ATGTGCAGAG 
AACTTGAACA 
TTTTTTGAAA 
GTGCTCGCTA 
TAACCCTATC 
ACAGTGAAGC 
TTGTAATCAC 
GTTGGAGTTC 
ACGTGGCTGC 
TAGACAGGAA 
CTAAGGTACA 
ATATTTCTTC 
TTCTCAAGTT 
ATGTTGCTTT 
TATATACTTT 
GTGATACGTG 
ACTCCAAATA 
TACAGCAGTA 
AAGTAAAGTG 
AGGACTCATT 
TATTTCTCTG 
TTATCACACT 
CTTGTGGATG 
AAGGGCTCCT 



TAGGAACTCA 
ATATTTAAAA 
ACTTATGTCA 
ACATGTTTAA 
AATCAAGTTA 
ACAATCAAAA 
CCTTCTAATA 
GAACCCAGGA 
GACAGAGTGA 
AATAAACAAA 
CTACTAGTTT 
TAGGACATAT 
CGTGCTGGAG 
CAATTCTCCT 
GCTAATTTTT 
ACTCCTGACC 
GAGCCACCTG 
TTTTAGGAAT 
CAATTATATT 
GAAGATTTTT 
GTATTCAAAT 
GCAGCCTCAA 
ACTTGTATTT 
TTAAGTAAAG 
CAACAGACCG 
ATCACACATA 
GGTAAGAAAT 
GAATAGGACC 
TTGGGGACTC 
GCCAGGTGAT 
GGAACGGAGC 
GCTAGCTTTC 
TCAGAAGATG 
GTCTTAAGAG 
CCCTGGGGGA 
AGAGAGCCCC 
CCAGGAGCAT 
TTGGAACCGA 
TGGCATACAG 
CTAAACTGTG 
ATGAAATTGA 
AAAGTTAACT 
TAGTCTTTAG 
TATTATATCC 
GAGCCACTTT 
TGTTGTCCCT 
TATGTATCAC 
AACAGACAAA 
CATGTTATAT 
GAAAAGATGA 
GAGAGCCCTG 
ATTCGGAGTG 
AATAATAAAT 
CTTACAAGGT 



TTCTTTCAAA 
AACTTGATAT 
TGAAATACTT 
TGTTTTCTTT 
AAAATATTCA 
GAGTCTGAAG 
TTTACTATTT 
GACGGAGGTT 
GACTCTGTCT 
CAAAAAAATC 
CCCTTTCCTC 
GAGGTTTTTG 
TGCAATGGCG 
GCCTCAGCCT 
GTATTTCTGG 
TCAAGTGATC 
CCCAGCCAGA 
TCAGTTACTT 
GATTTCTCTT 
GCACTGTAGT 
AAATTGAGGT 
AAGGTCTTAG 
TCCCTCTACT 
TGCTCACTCT 
TTTTAGCTTC 
ATTGAGAAAA 
CTTGGAAATA 
AGTTCTACTT 
CTCTTTGTAG 
TTCAGTTAAT 
CCATCAGCAT 
AACTGTTTTG 
ATTCTGCCTC 
TGAATTACCC 
ACTATCAGAG 
GCATGATGAA 
GAAAATCCAG 
TTCTGATGAA 
AGGTTGGATG 
TCACATAGGT 
GTAAGTCTTT 
ATTTGTATTT 
TCTTAAGGTT 
TCGCCTTCAG 
TTTTGTGGCT 
GTCACAAAAG 
ACACCAGCCG 
TGCAACCCCT 
GGAGGTGGCA 
CATTTGGGTA 
ACTAATACAC 
CTGAGAGCCT 
AGGACAAAAT 
AGAAGGTTAT 
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TATTTGGCAT 
CACAACTGGG 
CATGGGAGAC 
AAATCAGTAA 
ATTTGGAAGG 
TGTATCTGAA 
GGCCCCCTGC 
TCTTCTTTGG 
CTGAGTCTGT 
AAAGAATTTC 
TATTTTTTCT 
CTAACTCCTG 
CTGGAGTGCA 
TCTCCTGCCT 
ATTTTTGTAT 
TAATTCTTTG 
GATTCCCAAA 
AGCTCTTACA 
AGTCACCCAA 
ATGTAATCTC 
CCAGGCTGGA 
GTGATTCTCC 
GGCTAATTTT 
AACTCCCAAA 
TGAGCCACCA 
CACAAAGTGG 
CCTGGTGCTC 
GTTTTTATCT 
AGGAAATACA 
CCTGTTTCCC 
TTAATTTAAA 
ATTATCTCTG 
GAACTAGAGA 
TTTTTATTAT 
TAATAATAAG 
GCGCTTTGGG 
CAACATAGTG 
ATGCTTGTAC 
TTGAGGCTGC 
CTTGTCTCAA 
TGTTTCAAAA 
TTAAAAGTTA 
CAGAAAATCA 
CTCCCAGGTA 
AAATTCAAAA 
GCTCATGGAA 
ATTCCAAATC 
GTGTTAGCTA 
CTATCCTCAC 
AGTTGCAAAG 
TCCCCCATTG 
TTGATACATG 
CACTAATGTT 
CATATCTCCC 



TTAAATCCAA 
TGCAAAATAA 
AGCACGAAGC 
GACAGAAGCT 
GGTTGCTCTC 
GCCATCATGC 
ACCCTAGAAA 
ATCTGGACTT 
CACTGCTGCT 
ATAGCTTCCA 
CTTGGGTGTT 
CTTTTTTTCT 
GTGGCACAAT 
CAGCCTCCCA 
TTTTAGTATT 
TAGGTATCAA 
CACGGTCTTT 
CAAACACGCC 
CAGTGTCCTT 
CCAGAGGGTG 
GTGCAGTGGC 
TGCCTCAGCC 
TGTATTTTTA 
CTCAGGTGAT 
TGTCCAGCCC 
ATATAACAAT 
TCAAAGTTTT 
GTACTATGAT 
TATAACTGAA 
CAACTTCATT 
AGAAGTAGTC 
GAAGGATACA 
GCATGGCCAA 
TTTCTTTTGT 
TTTAAAATAA 
AAGCAGAGGT 
AGACCCTGCC 
TCCTAGCTAC 
AGTGAGCTAT 
AAAAAAAAAA 
TATGTAATAT 
ATAATTATTG 
TCCATATCAG 
ATTAGCAGGC 
TGCTCCAAAA 
TATTTCAGAT 
TGAAAAAATC 
ATTAGACCCT 
TTCTAATAGC 
ATAGTACAAA 
TTAGGATTTT 
AAACTCTATT 
TTCTTTCTGT 
TAGTCTTTTT 



CTGAAGACTA 
ATGGAACTGC 
TAATCCCACT 
GGTCAGATTA 
ATTAGGCAGT 
CTAGTTATGG 
GCTGGGTGGG 
TACCTCTATC 
AACTCAGCAG 
GCATCCTCTC 
GCAGCTCTCT 
TTTTTTTTTT 
CTCGGCTCAC 
AGTAGCTGGG 
GCTGTCATCA 
ACCCTAGGAC 
TCATATACAT 
CTCCCCTAGG 
GTCACATCTT 
TTATCATCTT 
ATGATCTCGG 
TCCTGAGTAG 
GTAGAGACAG 
CCACCCGCCT 
CATCTTTTTC 
ATTTTGAATT 
ATGTTACAAA 
TTCAAACCAA 
AAATTTTGGT 
TTCTATAGCA 
TACCATCTCT 
CAGGGAACAT 
GTGGGGTTTT 
AGGTTTGAAT 
AACTTTTGGC 
GGGAGGATAC 
TCTGTAGAAA 
TTGGGAGGTT 
AATCACCCAC 
AGGGGGGGGG 
TTAGCACTAA 
TCTCCTTTAA 
CAAGCTAAAC 
AGCCTCTACT 
TCGGCAACTT 
TTTGGATTTT 
TGAAATACTT 
TCATGGTCTC 
ATGAACTTTT 
GACAGTACAG 
ACATTATTAT 
AACCAAACCC 
TCCAAGGTCC 
TTGTCTGTGA 



ATAAGACTAA 
CATGCTCGCC 
CATCTTGCAG 
,TCAAGAGCCC 
GCCTGACCAC 
TCCCCCACTG 
TTCTACTGTC 
TGATTTTTTT 
TTCTAGGGTC 
TCCTTCATTA 
CTCTCCTTCC 
TTGAGACGGA 
TGCAACCTCC 
ACTACAGGCG 
ATCCACATGT 
TCTTTCCTCT 
TTTCCACTGT 
AAGCCTTTAT 
AGGTTCTACA 
TTTTTTTGAG 
CTCACAGCAA 
CTGGGATTAC 
GGTTTCACCG 
CAGCCTCCCA 
TTTTAGTTTA 
ATGAATAACT 
AGAAAAACAA 
ATAAAAAACA 
ATGTTAGTAT 
A7AAAAAGAA 
TCTGTTAAAA 
TGCTCTGGTT 
GCTTTTGTTT 
TTCAAACCAC 
TGGGTGCAAT 
TTGAGGCCAG 
TAAACAAAAA 
GAGGCAGGAG 
TGCACTATAG 
AAACAAATAA 
AGAATTCTGA 
AAGAATTGTT 
TTTCTCAAAA 
CAGGTTGAGT 
TTTGAATGCT 
TGGATTTGAG 
CTGGTTCTAA 
TTCTAGACCT 
CTGTTTTAGA 
GAGAGTTCCC 
GATACATTTG 
TAGACTTTAT 
AATCTGGAAT 
CAATGTCTCA 



TTAATTAAAA 
AAGTGTGCAT 
GTTGCTCCAT 
TAGTTAAACA 
AACAAGAGAT 
TTCATGATGC 
TGCTTTACTG 
TTCTAATATA 
ATTGCCCCAT 
TACTTTGATT 
CATGTCTTGT 
GTCTCGTTCT 
GCCTCCCGGG 
CTCACCACTA 
CCAGAAGCAC 
AATCACAATA 
ACATACTTTC 
AAATGTTCCC 
CCTTTATTTG 
ATGGAATCTT 
CCTCCACCTC 
AGACGTGTGT 
TGTTGGCAAG 
AAGTGCTGGG 
GTTCTTAACA 
AAATGAATAT 
GTCTAAAATA 
GGTGGGGTAA 
GATAATACTA 
ACAAGTAAAT 
AGAAAAAAGT 
TCTTCCAAGA 
TTGTTTGTCT 
ATAAATCTGT 
GACTTACACC 
GAATTTGAGA 
TTAGCTGGAT 
GATCCTTTGA 
CATGGGCAAT 
ATAAATATAA 
ATTGTAGAGC 
ATCAAAGTAT 
TGACATATCC 
ATTCCTAATC 
AACATGATTC 
ATACTCAGTA 
GCATAAGGGA 
CAGCTTCTTC 
ATAATTTGGA 
ATATATCTTT 
TCAAATATAA 
GTGGATTTCA 
ACCACACTGC 
GTCTTTTCTT 



GTTTTTAAAT 
GAGTGGTGTG 
TTTTCTCCTA 
CAGCAGTAGC 
GAACAAGCCC 
CTGAAAGGGA 
CTAAAAACCC 
TGATTTGGCA 
TGCCTCACAG 
TCAGCATTGC 
TGGTTTTCTG 
GTCACCCAGG 
TTCAAGCTAT 
TGCCCCACTA 
CTAGAAACTC 
TATAATCCCT 
TGACCTGGAA 
AGGAAGAATC 
TTCTATCTGA 
GCTTTGCTGC 
CTGGGTTCAA 
CACCACACCT 
GCTTTCCTCG 
ATTACAGGTG 
AATAGTCTGA 
TTCCAGATTT 
CCTGCCTCAA 
AAACTGAAAC 
GGTCATTTTT 
GTATATTAAT 
ATTTTAAAAA 
GAGAAATGAG 
ATCTGTTAGC 
TACATGCTCA 
TGTAATCCCA 
TCAGCCTGGG 
ATGGTGGTGC 
GTCCAGGAGT 
AAGGTGAGAA 
ACAAAACTTT 
TAAAAAGTAC 
AATTTTTATC 
ATGTAATTAG 
TAAAAATTGG 
TCAAAGGAGT 
TAATGCAAAC 
TACTCAACGT 
AAGGTAACCT 
TTTTCAGGAA 
CACCTAGCTT 
GCAACTCACA 
CCACTGTTTC 
ATTTTCTTGT 
GCTTTTCATG 



Figure 8 (Page 72 of 73) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



88/162 

233281 ACCTTAACAG TCCTGAAGAT CATTTGCTTT TTTTTCATAA TTACACCGGA GTTATAGATT 

233341 TTTTGAAATA ATACCACAAG GGCAAAGGGC CCTTCTTGTC ACATCATTTT AGGGAGAACA 

233401 TGATATCCAC ATGACATCAC TGATATTAAC CTTCATCATG TGGTTTAGGT AATGTTTCAG 

2334 61 GTTTCTCTAC TGCAAAGTGA TTTTTTTCCC TTAATTTAGC CCACCTGAAC TTATCAATTT 

233521 TGTTTTCTTC CATGACTAAT ACTTTTGTTA TTATAGCTAA AACTTCATTG GGGCCAAATC 

23 3581 TTAGATCATG TAAATTTTCT TCTATATTTT ATTCTAAAAG CTTGTAATGT TTGATACATT 

233641 CTAAAAGATG TAATGTTTGA TACATTACAT CTAGTCCTTT GATTTATTTT TAGTTACTTT 

233701 TGTATAAGGT GTGAGAGATG TCTCCAGTTT CACTTTATTA ACACATTGTG GTGTTCCAGT 

233761 ACTATTTGTT GCTAAGACTA TCTTTTTTCC ATTGATTACC TTTGCCTTAG TTGGCAATAT 

233 821 TTTTGTTGGT TTATTTCTAG ACTGTTTATC TCATTCCACT G ATTTGTGTC TATCTTTTTG 
233881 ACAAAACTGT TGATTACAGT AAGCTTTGAA ATAGTTCATT TTTTGTGTCA ACTTGACTGA 
233941 GTCAGGGGAT AACCAGCTAT CTGGTTAAAC ATTATTTCTG GCTGTGTTTG TGAGCGTGTT 

234 001 TCTGGATGAG ATTAGCCTTT GAATAGGTGA TCCTAGTAAA GTAAACTGTC TTTCCCAGTG 
234061 TGGATGGCAT TATGCCACCT GATATTCAGG GTCTGAATAG AAGAAAAGGC AGAGGAAGGG 
234121 GGAATTTGGG CCTTTTTTTC TGCCTCACTG CTTGAGCTGG GACATCTCAT CTGGTCTCCT 
234181 GCTCTTGAAC TGGGATTTAC ATCATCAGTT CCTCTGGTTC TCAGGCCTTC AGATTCAGAC 
234241 TGAATCATAC CACCAGCTTT CCTGGGTCTC CAGCTTGCAG ATTACAGATC ATGGGACTCC 
2343 01 TCATCTTCCA TAAATGCATG AGCCAATTCA GTCTATGTCC TTGAAAACTG CCCCACTGCA 
234361 GATTAAGGCT TTTTTCCACT AGGTGAAATA AAGAAGCTTG TTAGACAGAT TTCCCTTCAT 
234421 CCAGTGCCCT CTCCTCTTTA AGTTACAACA CATTGGCTAC ACCTAAGTGC AGGGGTGGGG 
234481 ATGAGGGTAT AGTCCTCTTG TTTGCTGAGA AGAGAACTGT ATTGGGAAAG CTCTAGAAGT 
234541 GTTTGATACA TACATAAACA AGGCATGGTT TTTGCACTTA ATTTCACATT ACATTTTTCC 
2 34 601 CAGAAAAAAA GGAATGTATA GGCATCACGT AACTGTACTA GCTGGAGTCA TTCTTCCTGA 
234661 TTATCAAAGG TAAACAGTTA TTAATCCTAT ACCAAGATGT CAAGGAGAAG TACTTTTGGA 
234721 ACACAAGGAA TTCTCTGGGA GTCCTTACTA CTCTCAAGCC CAGTGAAAAA GTTAATGAAA 
234781 AACTATAGTA CCTTCCTATA AGCTGGATGA CTAATTACCA GGCTCATTTA GGAATTTGCC 
234 841 TTACCAAGTA AAACATAAGG GCAGCTGAGG TGCTGACTGA AGACAAATGG AGCATAGAAT 
234901 AAGAGTAGTA AAGAATGCCA AAAATGCTGT CATGTATCCA TTGACAAAAG GAGCTATAAA 
234 961 GCCTTTAGGT ATTTTCACAC TTGCTCTGTT ACGTAAATGT ATGTGTGTGT GTGTGTGTGT 
235021 GTGTGTGTGT GTG 
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1 CACACACACA CACACACACA CACACACACA CACAAATGAG GTATATAAAG GGTCTCCTAA 
61 AATGTCATCT GATATTTGTT ATTTCATATT CTCAGATTTT TAATCCATTT AGGTAGGTCT 
121 ATTTTAGATA GCCTTGTCTG AAACAGAGCT GGGACCTGAT GAGTGAAAAT GAGCTCACCA 
181 GAAGAAAAAT CAAACAGGCA TTTCAGAGAT TGAGGCCAAG AAGTTAAATG TCTTAAATGG 
241 GCAGAGCTTA GCTGCTTGAT GTGAAAAGAG ACCAGCGTGG CTGGAACAGC AAAGGAGAAC 
301 AGCAGAAGAG GTGAACAGAG GCCAGAGATG GTCACTGAGT GGGCCCTTAA GTCATGGTAA 
361 GGAGTATGGA GAATGAATTA TTGCATGTAT TGAATATGTA GGTGACGTGA CTCACAGATA 
421 CTTTGGATTT GTAGAGATGA AGGAAATGTA GCAAGTGACA CTCTTAGAAT GTTGATTTGA 
481 GTAAATGGTA GTGTCAGTTA TTGAACTGGG GAGAACTGGA AGGGATAACA GGCTTAAGGA 
541 GCACGTTTAT TCCTGTGTCT TGGAAGTGTT TAGGGTGAAA GACCTATTAG AGTTCTAAAT 
601 GGAGATGTCA AGTGAAAATG TGGCTACACA CATTTGCATT TCAGAAAAAA GGTCAGGCTG 
661 GAGATGTAAA ATTGGAAGTT TACTGCATAT AGATAGTCTT TGGAACCGTA GTATTGATGA 

721 AGCCATTAAT GAGACAGAAC AAAGACTAGG GACCAGAGCC AAGCTCCAAG TTTCTAAAAT 

781 TTAGAGGATA GTATAGTCTG GTCATTTTGA GGTGAATACT TAATAACAGA ACAATTTGCT 

841 GAAGTGTAAA TTTAGAGCCC TACACTTTTA GCTCTGACTA TTAACGAATA CAGGAAAGAA 

901 TGGATATGGT TATCTGCCTG GTGTCTGTGA AATAATTTAA GCCAGGAAGA GATCCTCACC 

961 AGAAACTGAC TATGCTGGCA ACTTGGATCT TAGATTTCCA GCCTGCAGAA TTGTTAGAAA 

1021 ATAAATGTCT ATCGTTTAAG CCACCAGTCT GTAGTATTTT GTTATGGCAG TCCAAGCTGA 

1081 CTAAGTTTTG GTACCCAGGC GTGGGATGCT GCAACAACAA ATACCTAAAC ATGGGGAAGT 

1141 GGCTTTGGAA ATTGGTGATG GGTAAAGGCT GGAAGAGTTT GAGGTTCATA CTAGAAAAAG 

1201 CCAATTGTGA AGGGACTATT GAAAGAAATA TGGACATTAA AGGCAATTCT GGCAAAGGCT 

1261 CAGAAAGGAA GAGAGCTGGA CAGAAAGCTT CCATTTTCAT AGAAACTTAG ATTTATAACG 

1321 ATCATGGATA GAATATTAAA TATGCTGGTT AAAATATGGA CTTTAGGCCA GGCGTGGTGG 

1381 CTCACGCCTG TAATCTCAGC ACTTTGGGAG GCTGAGGGCA CAGATCACGA GGTCGGGAGT 

1441 TTGAGACCAG CCTGGCCAAT ATGGCGAAAC CCTGTCTCTA CTAAAAATAC AAAAATTAGC 

1501 TGGGCATGGT GATGTGCTTC TGTGGTCCCA GCTACTCGGG AGGCTGAGGC TGAAGAATCG 

1561 CTTAAACCCG GGGGGTGGAG GTTGCAGTGA CCCAAGATCA CACCACTGCA CTCCAGCCTG 

1621 GGATACAGAG CAGGACTCCA CTCCCCCCGC CACACACACA CAAAAAATAT ATATATATGG 

1681 ACATTAAAGT CAACTCTTGT GAGGTCTCAG ATGAAAATGA GGGACAGGTT ATTGGAAACT 

1741 GTAGAAATCA CTGTTCTTGT TACAATGTGT CAAGAACTTG GCTGAATTAC GCTGTAGTGT 

1801 TTACTGGAAA GAACTTATAA GCAGTAAAAC TGGATATTTA CCAGAAGAGA TGTCTAAGCA 

1861 AAGTATTGAA GGTGTGATTT AGGTCCTCCT TACTGCTTAA AGTGAAATGT GAGAGGAAAG 

1921 AGCCGAAATA AAGAAGGAAT TTTTAAGCAA AACACAATCA GAACTTGGAG ATTTGGGATA 

1981 GATTTCTCAA TCTATATTGT AAAAATTGAG AAAGTTTTTC TTGAAGAGGT ATGGTTGAAC 

2041 AATGTTTTCT TTTTCTTTTT TTTTCTTGGT TTTATTTTTA TTTTTATGTT TTTTGAGACA 

2101 GGGTCTGGCT ATGTCATCCA GGCTGGAGTG CAGTGGCACA ATCTCAGTTC AGTGCAACCT 

2161 TTGCCTTCAG GCTCAAGCAA TCCTCCCACC TCAGCCTCCT AAGTAGCTGG GACTACATGT 

2221 ATGCACCACC ACACCCTGGC TAATTTTTTG TTGTTGTTTA TAGAGATGGG GTTTTGACAT 

2281 GTTGCCTAGG CTGGTCTCTA ACTCCTGAGC TCAAGTGATC TGCCCTCCTC AGTCTCCCAA 

2341 AGTGTTGGGA TTACAGGCGT GAAACACTGA GCCTAGCCTG AACAACCATT TGATAAAGAG 

2401 ATAATGGGTG TGACCCAAGG ATTTAATCAG CCATCTCAGC AGAAGCCAGG AAGAGAGATG 

ItV: G ° ATTATTCC AGCAGAGACA CTGCCAATTT AAACTAACGT AGGCAGAGAA AACAGAAAGG 

III 1 : ^f^^ GGTTGTCGAC TTTTTGAATT CTATAGAACA GGATCATAGA GCTACCTGGC 

2S81 TGTCAATGTG TACTATTCTT TAAGAAAAGG AAAGACTGAC CCACCAAAGG CAACTTACAA 

2641 GATCACTAGG GCTGACTCTT TTTTGTTTTT TCTTGAGGCA GTCTCACTGT CACCCAGGCT 

2701 GTAGGGCAAT GGTGTGATCT CAGCTCACTG CAATCTCCAC CTCCCAGGTT CAAGGGATTC 

2761 TCTTGCCTTA GACTCCCAAG TAGCTGGGAT TACAGGCTCT AAATCTGTAC CCTCCCGAGT 

2821 AGCGCTCCTG CCACCACTTG CCCAGCTAAT TTTTGTATTT TTAGTAGAGA TGGGGTTTCA 

2881 CTATGTTGGC CAGGCTAGTT TGGAACTCCT GACCTCCAGT GATCCATTCT CATTGGCCTC 

2941 CCAAAGTGCT GGGATTACAG GCAGGAGCCG CCAGGGCTGC CACTTTGATG TCAGACTCAG 

3001 AGAGTACAGA TGGGATAGGG TGGGGGTGGG AACATGTAGT CAAGGCTGAC TCTACCTGTT 

3061 TCAAAGATGC CCTGCAGAAC TGTGTGGGAG TCTCTCACAG ATGGCTGCCT GGGTGGGACC 

3121 CCACCAAACT GAAAGACCGA GACTTCAGGC AGGGCAGATG GAGTAGGCCA ACTACAGAGC 

3181 CAGAGGTGAC ACTGAGACAC CACTGGGCCT GGAAATCAGG GCATCAAGCC AAAGAGGGTT 
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3241 
3301 
3361 
3421 



ss ssss sSo™ imsiT ™ «»«~ 

CATTGTACCT T^T^lll CTCCATTTTC TAATGGGAAT GTCTATTATG CCTG1TTCAC 
SESSSS S?SIc f° GTTCAAAGCT Q«2S£5 

3481 AGATGACACT SSJSSJ JSSSSSn I CACCCGTAA CCTGATTTAG ATGATTTTTT 

354a tgggatgga! SSiSS ESSSE * agaatgagt taagactttc AGGGGGCTGT 

360! GAGTGCAGTG ISSSST TAGCTCTGTC ^CAGGCTG 

3661 CATGTCTCAG CCTCCAgS AgSggSS JJSSS^ TCCCGGGTTT ATGCCATTCT 
3721 TTTTTTTTAT TTTAGTArar JSSSSJf ACAGGCQ CCC GCCACCACGC CTGGCTAATT 
3781 TGACCTTCTG SSSES SSS SSESS CCAGAACGGT "CGATCTCT 
384! CATGCCCGGC TGGGATGGAA S^aS SSSf TGTGAGCCAC 

3 901 GGTCAAGGAC AGAATGTTAT GGATT^^ rZzlt GAAGGACATA CATTTTGGCA 

3961 TAAACCCCAG tSSSJ SSS£2 AAATTCATTT ATTAAAACCC 

TCACAGGATA GGGCCCTAAT CcSSSS rrrr^Z GGGGTACATA AAACTAAAGA 
GAGCTCTCTC TCCACGCA^ S TACAGAAGAT ^GACACTTA 

S= A S ™ JS3SS SSSSSS 
SSSSS SESS sess SSSS SSSS g a ™ 

gctaagacaa tgaaggatgt ggtaaaactt tSctSca! cS^n™ GCCTGAGTAG 

AATTTAGCAT GCTTTCTTrT TTrTZ^r-it 1ACGTCCCAA CCACATACCA AAGAGGCTGG 
CATGTTGGCT cSSSS SS^IS GGCAATGTGC ACAAGTTCTA AATCCTAAGA 
CATCCAATGA aSSSS SSJSc SESSS* AATATAATAA 
CATTTTATTT TGAAAtSS CcSSSg CCAGAGAATT 

SSE 2SSSS ssss S ™ = 

IHPil 

5281 AATTTCTAAT ACTTTCTTTT ^rr^n^ll ^™ TATAAT TTTTTAAAA A TTGGTTATAA 
5341 AAATGATTTA cSSSS EJSSJ 0 ^GGAAAA TATAATTCTT ATAAAAGTTC 
5401 TTGCTACATA ™* GAGATGATGA ATGAATTAAA GG AAAGGATA 

5461 TGATCTgS SSSS GAAA ^GA TTGTTGATTT TGTGTTAAAC 

5521 CTCAgS JSSSI J—"™ CCAAAAAA ™ ATTTTATCTC AGCCTCATAT 
5581 AGACcS JSJSS SS™^ AGGTGCCTTT GGTAATTGGG 

5641 CATTCAGGCA SSSJS S2£^ CGCCCCAAT T TAAATAGTCC TCCCCAGGGC 
5701 ATTCATTgS SS S™?; A * GGAA ™A AGCTACCGAA 

5761 GGCATTTCAA AGTAGAAGGT SSn™ AAGTAACTAG GGCTTTTGAA TATAATAGTG 
5821 GAATGTCCTT wSSSS GGAGATGAGG AGACAGGACA GAGCTACGAG 

5881 CTGGCACCTT SSgcS C ™ G ° TAAG AACTGGTTAA 

5941 CCTCTTAGGT AAGAACTGGT SSJcSr GGACTAGGCT CTTAGCAGTA 

6001 CTGCCAATGA AATTTGGaS SJS^J SS*"™ TCTGAAGCTC CCAGAACAAA 
GTTGTTTTTT TlSSr^ ™^ ™ AGTTTCTTTT TTGTTGTTAC TTTTTGTTTT 
TCTCTtoS TCACTCTCAC TGCAACCTCC CCCTCCTATA TTCAAGTGAT 

SSSESS™ AGTAGCTGGG ACTACAGGCG TGCACTAGCA TGCCCAgSI 
iSSSSJ ™ GTAG AGATGGGG " GGTTTTTTTT TGAGACAGAG TTTCAcSg 

JSSS; S2£5 I, tggctcact acaacctcca ™ c ^ g ™ 

AACACCGCCA CAGTCTCCTG AGTAGCTGGG ACTACAGGCG CCTACAGGTG 

££gSS SSSSn TTTTATTAGA GATGGGGTTT CG CCATGTTG 

GCCAGGCTGG TCTCAAACTC CTGACCTCAG GTGATCTACC CACCTCAGCC TCCCCAAGTG 
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CTGGGATTAC 
AAAACACTAT 
AACCAACTCT 
CATTTTCTCA 
AGTATTTGAA 
TAACAAACTT 
TTGGAGCCCA 
ACAATAAGTC 
CTTTGTTTTA 
CAGTATTAAT 
TTCCTCCTTC 
GACAAAGTTT 
CTTAAGGAGG 
AACTTAAATT 
AACCTATCTC 
CCTGGCACAC 
GAATCTACAC 
AGAGATGGAG 
TGGTCAGTGA 
TTCTCGGACA 
ATGTGAAACC 
TAATTTAAAA 
TTATGCCACT 
TAGGGAAAGG 
GTGTCAGTAT 
ATCTCCTGCC 
GGAGTTCCAG 
CGCAGATGCA 
AAGCAGAAAC 
AGGCTGGCGG 
CCGTTTCTAC 
GTCGTCTGAA 
CCAGCTTGGG 
TGCAGAAACC 
TCTCTAGAAA 
CAGATCCCTA 
GGGCAGTGGC 
TGCGTCAGCT 
TTTTATTTAT 
GAAGTCGTGA 
TGAGCAGAAA 
GTAATGGCAA 
CCGTTCTACA 
TTTAGGTAAT 
AAAGTGAGGC 
CCAAACAAGG 
CAACGAAGTG 
TCGATCTTAC 
TACTATTACA 
GTAATACGCT 
TACCTACAAG 
CTTTGGGTTT 
TTGTGGTGAC 
CCCTGAGCAA 



AGATGTGAGA 
TAGCAACCTA 
CTACAACAAA 
ATGCCCAACA 
TAAGAGGGGG 
AACACAATGT 
GAGAGAAGAA 
AGTTGCACCA 
TTTGCCACAC 
ACATTGTCAA 
CCTTAAATTC 
ACCCATTATG 
TGTGGTTATA 
CTTTAAGCTG 
TTAGATTGTT 
AGTAGTGCCT 
TTGCTGAGCC 
GTAGGAAGAG 
ATAAAAAGGA 
TACAGGAAAT 
AGACCTTCAA 
ATACCCTCGG 
TTGTTTTCAC 
AGGGGGTGGA 
CTGGGAAGTG 
ACACACTCGG 
AAGCGTTAGA 
TAGGCAAGAC 
CGGCCGGGCG 
ATCACCTGAG 
TGGTGGCGGG 
CCCGGGAGGC 
CAACAGGAGC 
GAGATCCGGA 
TTTGTCCATG 
GAAGCAAAGG 
ACGATCTCGG 
TCAAGAGTAG 
TATTTTTATT 
CCTCAGGTGA 
GCAAAGGTTT 
CCTAGACGCT 
TTAGGGACAT 
ATACTCTGCA 
TGCCTACAGC 
TTACCAACAC 
TTTAGATCAC 
AAAGCATTAA 
TACAAACAGA 
TTGCTCAAGG 
CAGTGAGGTT 
GATAGCGTTT 
TCTCGGTCTT 
TGGTCACCCG 



CACCAGATCA 
TTAGTCTAAT 
GTGCTTCCTG 
GCCAAGTGTC 
TCTACATCTT 
ATCATTCACT 
TTGAAATTCA 
AGTCTTGTAG 
CCTAAATAAA 
GATTTACCTC 
TTCAGAGGTT 
TATGGATGTT 
GAATAGTCAG 
TTTCTTAGTT 
GGATTAAATG 
AATAAACCAT 
AGGTTCTTTT 
ATTAAGCCCT 
TTCCAAGGCC 
GCTGGGGGGG 
ATCTGATGAT 
AAAATTCTAA 
CCAAATGGGA 
GGGAGGGAAG 
GGAGGCGCGT 
ATTTGAAGGC 
CTAAACGACT 
TTAGCCCGCC 
CGGTGGCTCA 
GTCAGGAGTT 
CGCTTGTAAT 
GGAGTTTGTA 
AAAACTCCGT 
AGAAAACCTC 
GTCCCAGATC 
TTTTTTTGGG 
CTTACTACAA 
CTGGGAGTAC 
TAGTAGAGAG 
TCAGCCCCCT 
TTGAGTGGCC 
TGAGCTTCTT 
TAGTCTGTTT 
CTTTAGCAGG 
CTAAATTGAG 
GTTAGAGTTT 
GAGGCATCCC 
CTAGAATATT 
CCAACCTTTA 
TTGGCATAAA 
AGCTCTTCCT 
CCGGGAGCTC 
CTTAGGCAGA 
GCCTAGCAGT 



GCCTCAGAAG 
ATTTAATACT 
GCTGCCTAGT 
TCCTGTATGC 
AAGTACTGCT 
ACTAAATAGA 
AGTTTTCTCT 
CTCTTTACTG 
AATTGTACTG 
TTCGTGTAGA 
AGAAAGCCAT 
TTACTCTTTC 
CTGTTATAAG 
TGCTCATCTC 
AATTAACATA 
CTCTCTTATT 
CATTTCAAGG 
AGGCCAAGGG 
CATAAGGCAA 
GGAAAATCCG 
TCTCAGCCCA 
TATGTGGCTA 
CATCCAACCC 
AGCGGAAAAG 
CAGCAGTAAA 
TCCAAACGAA 
GGGTCTGTTT 
TAGACTTTTC 
CGCCTGTAAT 
CGAGACCAGC 
CCCATCTACT 
TGCAGTGAGC 
TTCAAAAAAG 
GGCGAGATTC 
TCCATTTCTT 
GGACCGTGTC 
CCTCCGCCTC 
AAGGTATGTG 
GTGTTTCACC 
CGGCCTCCCA 
ACAGGCCCCA 
AAAATACAAG 
TACAGACACC 
AATGGAACCT 
AAAAAAATAG 
TGCCTTCAAT 
TGCATGTAAA 
TCTTTAGAGT 
GTAACAGCGC 
ATTAACTTAC 
TTGAAACGGT 
AGATACCTGT 
AGCACGGCCT 
TTGTTGAGCT 



ACATTTTCTA 
TAATGTCTTC 
CATTGATTCA 
CAAGTTCTAT 
TAAGATGAAA 
CCGAATACAA 
CTCTCCTTTT 
AGCCATGTTT 
GCTTTTTTTC 
TTCCCTGGGG 
TAGTAACATT 
CATTTTTCTG 
TACTGTTTTC 
AAAATTCGGA 
CTGGAAGCTC 
CAGCCTGTTT 
TGAGCAAAAG 
AGCTGGAATC 
TTCTAACCTT 
GTCTTCTCAG 
GCTGCCCATT 
TCAAAGGTGA 
TTTTCCTTTG 
GCTGGATCCG 
CAGCTTCTGC 
ACAATGCAAA 
GGCCAGTCTG 
TGCCCACTTA 
CCCAGCACTT 
CCGGCTAACC 
AGGGAGGCTG 
CGAGATCGCG 
CAAGCAAACA 
ACAGAATCCA 
GTGGGTGGGG 
TCACTGTTGC 
CCAGGCTCAA 
CCACCACGCC 
ATGTTGGCCA 
AAGTGGTAGG 
CTCTATTTCC 
AGTAAGTTGC 
TTTCAACTCC 
ATAACTCTCA 
ACGGGGGACT 
TTACATTTTT 
CTGTTAGGCA 
ATGATAGTAC 
TCCCCAAAAA 
CTTAGTGCCT 
AGGGGGGCTC 
CAAATCACTT 
GGATGTTAGG 
CCTCGTCGTT 



TTGGAAAGAG 
CTTAGTAATA 
TTCAGTTCAA 
GCTGATTATC 
GCCTCTAGGT 
AATCTTGTTA 
CTCACTCACC 
TCACGTGTCC 
CCTGGGTTTA 
AAAATTACCT 
CTGGTATGTG 
ACAATAATCT 
CTGGCCTTAC 
ATAAGGATAA 
ATGAAATGTG 
TCTGATTTCA 
CATACAAGGA 
AAAGGCAATT 
AGGATCGAAA 
CCCAAGAGCC 
AGAATCGTTG 
TCATTTGCTT 
AGAGTAGTTG 
CCCCGAGCCG 
TAGGATTATT 
ACGCTTCAGT 
AGCAGCTGGG 
ATTCCGATCA 
TGGTAGGCAG 
TGGTGAAACT 
AGGCCGGAGA 
CCACTGCATT 
AACAAAAAAA 
GGAAAATAGG 
CAGCTGTTAC 
CCAGGCTGGA 
GCGACTCTCC 
CAACTTATTT 
GGTTAGTGTC 
ATTAGAGGGG 
TTTTCTGCCT 
ATGTCAGGCA 
CTGGTTAACT 
CAGAATTAGG 
AGTCGGAGGA 
AAAGTAATCA 
CTAACTATGG 
GTAACTGACC 
CCGAAAAGCA 
TTTTTCCTTC 
TGAAAAGAGC 
GCCCTTGGCC 
AAGGACGCCG 
GCGGATGGCC 
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AGCTGCAAGT 
AGCTCCAGGA 
GCCCCAACCC 
AACTGGAGAC 
TTACCACGTC 
GAGAAAACAA 
GGGAGTAAAT 
AAGCTGTACT 
GTCTTCCAAT 
GCAGAAATCC 
AAGTCTGCTC 
GATGGCAAGA 
CTGAAACAGG 
TTCGTTAACG 
AAGCGCTCGA 
GAGCTGGCCA 
AAGTAAACAT 
CCAGATACCC 
ATTAGAATGT 
GGGTCCTGAA 
TTAAATTTAA 
CAGGCTCGCT 
GTTGCCGTAA 
AAATATTAAC 
TGGGAACCTG 
CTTCGTAGTA 
TAACCTAATA 
GCACTGCGCC 
AAATCAAATC 
TGATTGAAAC 
TTTAGGAGAA 
AAGGCCAGTA 
AGCACATACA 
TGGCCTGGGA 
TTTTAAGATG 
TTTAGCACCT 
AGTGTTAAAG 
CATAAGGACT 
CCTGTCTTTG 
CTTGCCTAGA 
CAAGTAAATT 
GGCTGGAGTG 
ATTCTCCTGC 
CTAATTTTGT 
TCCGGACATC 
CCACCGCGCC 
TATTCCCATT 
TCTGCAAACA 
GTTTTTGCCT 
CAGTAAGCAA 
TTATGTGAGT 
CCCTGGACCC 
CGTCCGCGTT 
TGGCAGTAGT 



GGCGCGGGAT 
TCTCGGCGGT 
GCTCTGCGTA 
CAGCGCGAGA 
CAGACATTGC 
ACAAAATCAA 
CCGACTTTTT 
TTCATACCTC 
TAACTAAGAG 
GCTCTTTACT 
CCGCCCCGAA 
AGCGCAAGCG 
TCCATCCCGA 
ACATATTTGA 
CCATCACCTC 
AGCACGCCGT 
TCCAAGTAAG 
ACTAAAAGAG 
AGGAACTGGA 
CCCGAAAGAA 
AATGGGGACA 
TAGGTTTCAG 
TGTCATAATT 
CAATCGAGGG 
GGCAGTAACT 
TACTGAAGGG 
TGCGTCAGTT 
AGATGTTGCT 
AAATTTTGCT 
TTAAAATCTC 
GCCAACTCTT 
AGGACTAGGC 
CTGTGTCTCC 
AATTCCACAT 
AAGGGTTAGA 
AGAAGTTTGC 
CAGATTTTTA 
GTGTGATCTT 
GTGCCAGACA 
TAACTTATCT 
TTTTTTTCTT 
CAATGGCGCG 
CTCAGCCTCC 
ATTTTTAGTA 
AGGTGATCTG 
GGGCCTAAAT 
CAGACTGACC 
AAATTCAGTA 
GTGTTAGATG 
ATTATTAACT 
CATACAATAA 
TTTGAGTTTT 
TGTTTGTTTT 
AGAATTTGAA 



GATGCGAGTC 
CAGATACTCT 
GTTGCCTTTA 
AGAGCGGGAT 
AATCAGACAA 
GAAATATGTA 
GATTGGTCGG 
ATTTGCATAG 
GTACTCTCCA 
TTCGACACAT 
GAAGGGCTCC 
CAGCCGCAAG 
CACTGGCATC 
GCGCATCGCG 
CAGGGAGATC 
GTCGGAGGGC 
CGTCTTAACA 
CTGTGGCCAG 
GAGGGGTGGG 
GCCAGCCATT 
AGCGGCCATT 
ACCCAGCTGT 
TCGCCACCAG 
AAAGCTGTTT 
GCCTAAGGAA 
TGTGTCTCCT 
TTGATAACAA 
TCATACATCT 
TGAATCCCAG 
CGTAGGGGGC 
AACTGCTGGG 
GCTGGGTGGG 
TAGAGGACTC 
TCCCTTAAGT 
CGTAGTCTAC 
TTTCTCCATT 
CAAACTTAAA 
AAATCTGCAA 
AGGCCTTATA 
GAGAAATTCT 
TTTTTTTTTT 
ATCTTGGCTC 
GGAGTAGCTG 
GAGACGAGGT 
CCCGCCTTGG 
GGTTTTTTTT 
GCTCTCCTAC 
TTCTTTCCCC 
AAATAATTCT 
TCTTGGTCAT 
GAACCAACAG 
CTGTTCACTT 
GGTTATTCTA 
TTCTGGTTTT 



TTCTTGTTGT 
AACACCGCCG 
CGGAGCAGGC 
TTCGCTTTGG 
AAATCACCAA 
AAACATGGCC 
TAGCAAATGC 
CTCTGCCCAC 
TCCCTCATTA 
TTCTGGTGTT 
AAGAAGGCAG 
GAGAGTTACT 
TCTTCCAAGG 
GGCGAGGCTT 
CAGACGGCCG 
ACCAAGGCCG 
CCTAACCCCA 
ACGCCAAATT 
GACAAGTGTT 
AAAAATGGGT 
TTGCTAACTC 
CTGTCCCTGT 
CTTCTAGCCA 
TGAGACTCTG 
GGACTCCCCC 
GGGTTTCCAA 
CACTAAGGCA 
TATTCTATTC 
TGCTCAGTCA 
TTGTAACATG 
TAAATTGACA 
GGAGAATGAA 
TCCCTTCCTA 
ATTTTACTCA 
CTATCTTTTT 
AAAAACCGGG 
TACCATGTAA 
TTTCTTTCAC 
CTTGAACACT 
GATGAGAAAT 
TTTGAGACGA 
ACAGCAACCT 
GGATTACAGG 
TTCTCCATGT 
CCTCCCAAAG 
TTTTCTATGC 
CTGCCAACTA 
GCCTTTTCCC 
ATTGCTTGTT 
TTATTTCTGA 
AAATGTGTGT 
TCCTTTGGCT 
ATTGGACTTG 
CTGGTCACAT 



CGCGAGCCGC 
CCAGGTACAC 
GGTGCACTCG 
CGCGAGCTTT 
AACCAGCAGC 
GCTTTTATAG 
TAGTCAGATA 
GGATGACAAC 
GCATAAAAGC 
TTAAGATGCC 
TGACCAAAGC 
CTGTGTACGT 
CCATGGGCAT 
CCCGCCTGGC 
TGCGCCTGCT 
TCACCAAGTA 
AAGGCTCTTT 
TTATTTGGCG 
GCAGCTTAGA 
TTGGGGTCAA 
GGCGTTCCCG 
CTACGTCGCC 
ATAGGCTGTC 
ATTTACATAG 
TCTGTTTTCG 
CTGCCCCGGT 
GTACAGAACT 
AACTGGTTTA 
GCCATAAATG 
CAGAAAAGTT 
AGCCTTCGAA 
GAGGAGACGT 
GACAACTGCA 
TGGTCTTTTC 
ATTCAAGTCT 
AATATACAAT 
TTTAGGTTAC 
ACCTGGGAAA 
GCTGTGCAAT 
GAAATTTCCA 
AGTTTCTCTC 
CCGCCTCCCG 
CATGCGCCAC 
CGGTCAGGCT 
TCCTGGATTA 
CTCTAATGGA 
ACTAATCAGT 
CTTTCTCTTA 
CTCTCTTCTG 
ATTTTCCACC 
CTTGGAAACA 
TTTGCATGCT 
GCTGATTGGT 
CATTAAGTGA 



GTTGCCGGCC 
CGGCGCGCCT 
GCCCACCGGG 
GCCTCCTTGC 
CTAAGCTCAC 
GTAGTTCCTG 
GCCAATAGAA 
TGTGTAGTTT 
CCTATAAGTA 
TGAGCCAGCC 
GCAGAAGAAA 
GTACAAGGTG 
CATGAATTCT 
GCATTACAAC 
GCTTCCCGGA 
CACCAGCTCC 
TAAGAGCCAC 
GCGGAGGGGT 
GAGGGACAAA 
TTCGTTGTGC 
GAAGAAACCG 
AGGATCAACG 
CTGTCATTTT 
CGGACCGGAG 
TGGCGCACAC 
AATAGTCTTT 
AAAGATGTAA 
TTCAAGATTC 
GTGTGTTGCC 
TGAAAGTTGC 
CACTGAACTG 
CATTAAACTT 
GGCCGCTTTG 
CAGGTAAAGA 
AGAACACGTT 
AAATAAAATT 
AGTTACTTAA 
TAAACTAAGG 
CACAGGCTGC 
GAGTCCCTCA 
TTGTTTCCCA 
GGTTCAAGCC 
GACACCCTGG 
GGTCTCGAAC 
CAGGCTTGAG 
CCTGGTCACT 
GTAACCAAAA 
CATAGATTAT 
TACAAGTACC 
AAGACAGTGT 
GGTTGTCTAT 
AAAAGTTTAT 
TGCATATTGG 
TTAGTCAGTG 
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GAGAGGACAG 
TGTTGATATT 
TGGGATTTGA 
AGTCCAAATA 
TAAAATGGCT 
AAAAAGGAAA 
CAGCTTATTA 
TTAGAGGACA 
AGAGTGCTTT 
TATTGTAAGG 
CAGAATTATT 
ACTTTTATTT 
CACTTTAAAT 
TATCTTTTTT 
CACAATCTCG 
CTCCCCAGTA 
TAGAGATGGG 
CCTCCCAAAG 
TTATGAATTT 
TCTTAAATTT 
TTTGTCTAAA 
CTAATTAAGA 
AAGTATTTTG 
TTTTATTAGG 
GACAAATTGG 
TAATATTAGC 
ACAAAAGCTA 
AGTTAACATC 
TGATAATAGA 
ATTCAATGTG 
GAATATTATT 
TTGGAGGGTA 
ACACAGTTTA 
ATACTGTTGC 
AGGACCATGC 
TTGCCTGCTT 
TAGCTACAGA 
TTAAAAAGTT 
AAAATAAGAC 
ATGTAATAAC 
CTTCAAACAA 
AAACACATTT 
ATTAAAAGAC 
GACGAGGTCT 
CCTTTTCTCC 
AAAAAACAAG 
TTTTGCAGGT 
TTACTAGAAA 
AAATGAACTT 
AAGCCAATTA 
TACCCTCATA 
TGTTGTTACA 
TTAGACTTGC 
GCCAGAAATC 



GAAATCTGGT 
CTCTGTGAGG 
TGTTTTGTGC 
GTTGTCGATA 
GCCCTGTTAT 
TCAAGGAAAC 
TTAATTTTAG 
GAAGAAACAT 
CAATATCTGA 
GATGTGATGC 
CATATTCTCA 
CTTTAACTGA 
TCTGTTCTAT 
TTTTTTTTGA 
GCTCACTGCA 
GCTGGGATTA 
GTTTCGCCAT 
TGATGAGATT 
AAATAATTGT 
TAGTTGGCTT 
AAAAAATCAA 
GAAAAAAAGT 
TAAAAAAAAT 
TTTTTTTAAT 
CTTAATAATT 
AGAATATTAT 
ATTTAACTTG 
ACTTTATTTA 
TAATGTCATT 
TGAGCTTAAG 
AAATTGAGTA 
CAAAATACAA 
GAATAACCAT 
TTTCGCCACT 
AGGTTTTGGA 
TGTTTAAGGG 
GAAACACAAG 
GTTACTGTTT 
TTCAATCTTT 
CAATCTTCTC 
CAAATACTGC 
GGAATCTCAC 
CTCCAGAATT 
GAAATAGACA 
ATTATCTGTC 
CAAATAAACA 
TTGTAACAAG 
ATTTATTTCT 
GTTTAACTAA 
AATTCTTGGA 
ACTTTTTTTT 
AAGCCATTGT 
TCCTTTATGA 
GTGAAGACAT 



TTATTTATTA 

ACACAGGGTT 

TTGTATGCCT 

TCTGCAAAAC 

AACTTTTGAC 

CAAATGTCTG 

TAATTTCACA 

AATGTTGTTA 

ATAAAACAAA 

TGGAAACTAG 

GCAGTGGTGC 

TCAACATGCT 

TAGCACGGTT 

GACAGAATTT 

ACCTCTGCCT 

CAGGTGCACC 

GTTGGCCAAA 

ACAGGCGTGA 

GAAATTATCC 

ACATAAAGAC 

AAATTTTCCT 

TTAACTGTGA 

ACTTCACAAT 

AAGGAAAATA 
TCATTTTAAA 

AGTATACACA 
CATTTACTAA 
TTATTCTAAA 
TTTAAAAATG 
TACTGAGTTC 
AATTAATTCT 
ATCACAAGAA 
TGATAAACAG 
TTAGATTTGT 
TGACTGCCTC 
CTATGGTTAA 
TAAGCATTCG 
GTTAATGTGG 
TTCTTATTTT 
TGACAACATT 
TTTTATACTT 
TGAGAAATAC 
CTGGAAGTAG 
GCTTCTTCCT 
TTTCCAGTGA 
AATCTCAGTT 
GACCTTTATA 
GCCTGTGGCC 
AGTTGGCCAA 
GACAATTTGT 
TGCCCTACTT 
CAAAAAAACA 
GATATTTTTA 
GGCCTACCTA 



ACCTTTTTTT 
AGAGTTGGTG 
CTTTCCACCT 
CAGTATTCCT 
TTTAAGAAAG 
GTCTCAATAA 
TTATTGCCCC 
CAAATTGGAC 
GATTTAATAT 
GAAACTAGAA 
CACCTGAGGG 
AAATAGATAA 
AGCTTTCCTA 
TGCTCTGTGG 
CCAGGGTTCT 
ACCACGCCTG 
CTGGTCTCGA 
GCCACCGTGC 
ACTTAAGGGA 
TTAAAATACA 
TGTGCTTTAA 
GTTTCATTAG 
TTTTAAATAA 
TATAATACAT 
AATGGCTTCt 
AGTTTAGGGT 
ATTTCTTCCA 
ATTGTAAATT 
GAATTAAATT 
ACAGTGTATG 
CAATCTTTGG 
ACAGTGTAGT 
ATAAGAGAAC 
AAATCATGTA 
TGTTTTCGTC 
TCCAAACAGC 
AGATAATGAC 
TACATTCAAT 
TATATAGCCA 
ATAACAATGC 
CAGAGCAGAT 
ACTATCACTA 
GAAGTTTCCT 
TCTTTTACCT 



TGAAATTTTG 
ATATTTTACT 
ACTTGACTAA 
CACATTTGAG 
ACTGATCTTT 
ACTTTAAGGA 
CTGTGCTTCT 
AAAAACAAAA 
CCAAAAATGG 
ACTTGGAAAT 



GGGGTGTTTT 
TTTTTCTTTC 
TCCAAAACTT 
GTGTTAAGAT 
TGTTAGGACT 
CTGCTATGGC 
TTCACGTTCT 
TATTGAGTCA 
TTTCTAAACC 
TTTTCTTCTA 
ACTTCTGATC 
CCTATGGCTC 
ATTGGCAATA 
CCCAGGCTGG 
AGCAATTTTC 
GCTAATTTGT 
ACTCAGGTGA 
CCAGAAAAGA 
ATTAATAAAT 
TCAATTTAAA 
ATGTGCTACC 
TGGTCTTAGT 
CTTAAAAATA 
CTAATCAAGA 
TTATTCTTAT 
TCATATTCTA 
CTAGTTGTAC 
ATTCATTGAA 
TTATGTTACT 
ATAACTTTAA 
ATACCTGGAC 
TTTATGCAAA 
ATATGATTGC 
CTGTATACGT 
ATGCCTATGC 
TCTGACTCTA 
TACCTTGAGC 
TTACTATGGA 
TGATTTATAT 
TGGAACCTCC 
GGATATGTGC 
AAAATACAGT 
CTTCAAAGTC 
GTGGTATTAT 
ATCTGGCCCT 
AAGATATTGG 
AAGTTCCTAA 
TCAAAATAAT 
GAGACCTATT 
ATTCTTATAA 
CTAATATGCA 
AACTAAACAA 
AGGAGTTGAA 
GTTGGTTGTC 



TGTTTGAAGA 
TGACTTTACA 
GTCTTTTTTG 
GATATGAATA 
AACAGGAGAC 
AGAGGCTCTA 
TTAAGTAAGG 
GGAAAAAAAA 
TTAACGAGTT 
AACTGAGAAT 
TTAATTACAT 
TGTTTTTACC 
AGATTGAGAC 
GGTGCAGTGG 
CTGCCTCAGC 
GCATTTTTAG 
TCCACCTCGG 
CTATCTTATT 
TATAATGTAA 
TAAAAACTCA 
TCTTTAAGTT 
TAACAGCTTA 
TTAATACCTC 
TTATTTTTTG 
ACTGTAAAAA 
AAAAACAAAA 
TGGTTACATG 
CCAAATTAAA 
AATTATAAGG 
GAATTTAGGT 
AATTTCTAAA 
TAACATTTTT 
CTTAGAATAG 
GTGGGCGTAG 
GGGAACACAA 
TCAAGTACTA 
CTTTACTTAT 
TTGTCACTCT 
TCATATCTTA 
ATTTTCAGTA 
TTCCCAGTGT 
TCTGAGATTC 
TACAGAGGAA 
TCTGTTTTGT 
CCCAAGTATT 
CATGCTAACT 
ATAAGAATAT 
CAATTAGGAA 
CATCTAAGAC 
TATTTGTAAT 
GATTATTAAA 
ACTCACATGG 
AAACTCTGGT 
AGTGGAAAAT 
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ACTACACAGA 
AATTGTTTCT 
GAGTATCCAT 
TAGACAGTTT 
TTGTTGAGTG 
AAAATTAAAT 
GTAATAATTG 
CACTAGTTCC 
ACAAAAAGTG 
GATGTGGGAC 
CTTTACAGTG 
ATGCATATAA 
GCTCCTAAAA 
TAAATTATAA 
TACCGGGGTC 
CTCTCTAGGC 
CAGATTTGGA 
AAAAGACATT 
CAAGCATAAA 
GGCTCTGAGT 
GTTCATTATA 
CTTGACCTCA 
TGGAGTGTAG 
CTCCTGCCTC 
TAATTTTTTA 
AAACTCCTGG 
GTGAGTCACT 
GGGACTTTGG 
AGAATAATTA 
GATCTCCTTG 
CTATGAGGAA 
TTAAAATTCT 
TCAAGAGAAA 
CCTTCCTTTA 
CAAACTCAAC 
GTTAGACTTG 
CATTTGAGCT 
TGTTTAAAAA 
CTAGCCTCAA 
GTTTAGAGAG 
CTTCGTCGCT 
TTATCCTTCA 
TTAAGAGCGT 
CCAACAGCTG 
CGCAGGGAGA 
TCTATTGTGT 
GCCATCATCA 
AGAAGTGAAT 
TCTTTTCCAA 
TGAGTGATCC 
ACACCTGGCT 
ATAGAAATTT 
ATTCTTTTTA 
GTATCATATA 



GATAGCCATA 
AGAGAATCAC 
ACTAAACTCT 
GTAGTTTTTT 
AAATCAGTCC 
GTCTTTCAGT 
CCCTACTCAT 
AGACGTGGTA 
AAGCTTCTGG 
TCTGAGGCAG 
ATGGTAATAG 
ACCACTGTGT 
GGACTTGAAG 
GAATTTCATA 
CAACAGGTTG 
ATATTCCTAA 
AGGATATATA 
AAAAATTAGT 
ATTAAATTGA 
AATTTCTTTG 
ATTAAGAAAA 
GTTCTTTTTT 
TGGCGCAATC 
AGCATCCTGA 
AAAAGTTTTT 
GCTTAAGTGA 
GTACCCCGCC 
TTTGCTGATT 
ATAGAGACAT 
CTGCTGGCTC 
ATAGACCTAT 
AGGCTTATTC 
TATGAATAAA 
TTTTCTTGTC 
TGTAGGCTAG 
CTTAAACAAT 
TCAGTGCACT 
ATCTGCAGAG 
GAGTGGATCA 
TGTGCTCAGG 
GTATCTTCTT 
AGTTTAGATC 
ACAGACATTC 
TGCTACCTGG 
ATGACAGTAG 
AAAGTGCTTA 
TTATCATTGT 
CAATCACTCT 
CAGTCGTCAC 
TAGAAGAAGA 
GAGAAAAATT 
ATGACACAGG 
TATGTATATT 
TAAAATAAAT 



GTGCTGCACA 
TAATTGTTTT 
TTTCTACTGA 
TCTCCCATTT 
ATTGCTTGAT 
CACAGTTTGA 
AAAGATGGGG 
TCATGCTAGT 
AGACAGACTC 
GTCATTTAAT 
CACCTACCTT 
TTACTGCTGT 
CAGCTTATGA 
AATTATTTGA 
AGAAAAAATA 
GGACTTAAAG 
TATTCAGCAC 
GAAACTTTTC 
GTAGAGTATA 
GGGTCTGAAG 
AGGGAGTAAA 
TCAGAGACAG 
GCATCTCATT 
GTATCTGGAA 
TGTAGAGATG 
TCCTCCTGCC 
CCACTTCAGT 
TAAAGATTCA 
CTGGTCTCAT 
AGAAGGGTAA 
GTAGAGGAGG 
TCTGACCATA 
CTTTTGTTTT 
CTTAGTTTTC 
AACAAAAAAA 
TGGGGTAATG 
GAAATAAATA 
AACAATACAC 
AAGATGCTCA 
GTTCTAGGCT 
TATGAAAAAC 
AAATGGAACT 
AAGGGCTAGA 
GAAACTTAAC 
GTATCTCATA 
AAACACTGCC 
TCTCAGAGTC 
CTCTCTTTTC 
TGCTGGACAC 
TAAATGGAGG 
AGCTCTTTTT 
AAACATAAAG 
ATATATACTC 
TTAGGTGTCA 



GCCAATCTTA 
CTTTTAACAT 
AAATAATGTG 
CTATTTTATA 
ATACCTTGAG 
CAAACTCAAC 
TGAAGATTAA 
AAAATGGCTG 
CAAGTTTGAC 
CTCTCTGTGC 
CTAGAAGTAT 
TTGACAAATT 
CTGAAGACTT 
TATGAAAATG 
CACTTTTTTT 
AATGATAACT 
ATTGACAGAC 
CTACCTTTAG 
CCACTGTAAC 
ATCAGTTTGA 
TCTGGAGAAT 
GGTCTCACTT 
GTAACCTCCA 
CCACAGCAGG 
GGGTCTTACT 
TCAGCCTCCC 
TCTGAGGAGG 
TGTAACCTTA 
GTTTCTACAG 
AAGAG C AG AA 
CTACCTGTGG 
TCAAGTTTTC 
CACTTTTCTC 
TTTTCACTTT 
AATTGAAAAT 
AACCTTGGAC 
TATTTTTAAC 
GTTGTGAGAT 
GCAGGCAACA 
CTAAAAATCA 
ACTAAGTCTT 
TTAGGACACT 
GGATGTGGGT 
CTCTCTGTGC 
AGGTTGTTGG 
TGGCACAGAG 
AAATACAATA 
TCCAGGGGGA 
TGTTTCATCT 
TATTTTGAAC 
TCTATGCATA 
ACAAAATTAA 
ATATTCATAT 
TGATATATAT 



AGTGTTTCTA 
TCTTGGTTTA 
CAAACATAAC 
AATCATCTTT 
CACAAGTAAA 
TACCCTGAGC 
ATGAAATAGC 
CACAGCACTG 
TCCCAGATCA 
ATTAGTATCC 
GTGAAGATTA 
TTATTTATAA 
TGGTAGGAGT 
CCAGTTGATC 
CCCTGAACAT 
ATCATTTCTC 
AATCCCAGTA 
CCTGTGTAAT 
ATTTCCTGAA 
CATATCCTCA 
GAGCCACTTT 
TGTTGCCCAG 
CCTTCTGGGC 
TGCACACCAC 
ATGTTGCCCA 
AAATTGTTGG 
AAAAAATATG 
TCATCCAATG 
TTGCTCATGC 
ATGATGGGGC 
TAAAACCTTA 
AAATGGTAAA 
CCTCCTCTCC 
TTTGTCTACT 
TAAAATGTGC 
ACTAGATTTT 
AATTAAAAAA 
CTTGAATGGA 
GAGTAAG AG C 
GACAGTCCCC 
TTTCCTCACT 
GACTAGGTTA 
TTACTGCACA 
CTTAATTTCC 
AACAACTAAA 
CAAACATCCA 
TCTCATATCT 
GACAACAGCT 
TGCAAATAAA 
AATCAAAGAA 
AAACTATTAA 
AATAACTCCT 
ATACATATAT 
TTAGATAAAT 



GAGAATCACT 
TACAAGAAGA 
ATCCTATTCC 
TTAAAATACT 
TAGTATGCCA 
CTATAGAGTG 
ACCTATAGAA 
CTCAATGATG 
CCACATATAA 
TTCTCTATAC 
AAGATCCTTA 
CCATCTTTAC 
TGGCCTTCTA 
ATAGTATGTT 
ATGAAATTAG 
TTAAATCTTC 
GTCCTAAATT 
CCTGGATGAC 
AGGTATTCTA 
AGTATCATGA 
CTTACTACTC 
GCTGCCAGGC 
TGAAGCCATC 
CATGCCAAGC 
GGCTGGTCTC 
GATTACTAGT 
TAATAATAAT 
CGCAATTTGT 
CTTGATAGTA 
TTCTCTCATT 
TCCTCATCAC 
AGAATTGGAT 
CCCCATTCTC 
ATTATTTGCC 
CCCTTTTGTT 
AAAACACACA 
TAAAATTGCA 
AGGAAAACTG 
ATGTTGGAGG 
ACGGCCTGGC 
GGATAAATTT 
CATTCATCTT 
GGCTCATTAT 
TCATCTATAA 
TGCATTGGTA 
GTGAACTTTA 
GATAAATTAC 
TTTAGACATA 
CCAATGAAAA 
GGACAAATGA 
AATATTCTTC 
AGTATCTCCT 
CTCACATCAT 
ATACTTAGAA 



Figure 9 (Page 6 of 74) 



WO 98/14466 



PCT/US97/17658 



95/162 



19441 

19501 

19561 

19621 

19681 

19741 

19801 

19861 

19921 

19981 

20041 

20101 

20161 

20221 

20281 

20341 

20401 

20461 

20521 

20581 

20641 

20701 

20761 

20821 

20881 

20941 

21001 

21061 

21121 

21181 

21241 

21301 

21361 

21421 

21481 

21541 

21601 

21661 

21721 

21781 

21841 

21901 

21961 

22021 

22081 

22141 

22201 

22261 

22321 

22381 

22441 

22501 

22561 

22621 



ACTTTTTTAT 
CTTCAATTGA 
TACATAAATC 
AAATCAAAAG 
CAGGATCATA 
TATTTACAAT 
TTGAAATTAC 
TAGCCATGAC 
TACATGGGAA 
CAGGTGCCTT 
CTGAAATTAC 
GTTTACTATG 
CCCAATATGA 
ATTAGGATGG 
ACTTGGTCTT 
GAGCTCTGCA 
CTACAAGCCA 
TTCCAGCCTC 
TTTTATGGCA 
CTAAGACTAG 
AGGTGAGAAG 
AGATAGCCAC 
GATGTCTTAC 
TGTAAAACAT 
CACTTAACTG 
TATGTGGTGA 
AAATAAATAA 
ATACCTGGAA 
GTTCCCATGC 
TCATGAAGTC 
TTATGGATAC 
TTTTTTTCTT 
AGGCTGCCCA 
TTCAACCTCA 
AAAGGGTAAA 
AAAATTTAAG 
GCAACAGAAA 
CATTAACCTC 
GAAATGTTTT 
AAACAACATG 
GAAAAGAGAG 
TGGGTGATTT 
TTAAATGGTG 
AGATGTGCAG 
GAACAGAGAG 
TTATTGTCTT 
TTCATCTCCC 
AAATTTTGTG 
TTACACTTGT 
TTCTACAAAA 
AATACTTGAA 
GTAAAAATTA 
ATATGAAAAG 
TATGGTAAGA 



GGATGTATAA 
TTCCCATTTT 
TTTGTTCAAA 
TGAAAAACAT 
CCAATTTATA 
AAATTTTAAA 
ATTATTTTAA 
CCTATAAGAA 
GGCACTATAT 
TTCTGCGGAG 
TTTTACCTAC 
GACTGAATTG 
CTCTATTCCT 
GTTCCTAACT 
CCAAATTAAA 
CATATACTGA 
GCAAGAGAGC 
CAGAACTGTG 
GCCCAAGCCA 
GAGAGAGAAA 
TTGACCTTGT 
CTCACAGTCA 
CAGGAGACAA 
TGAATCTCAT 
ACAGTGATAA 
AACAGGTGCT 
ACTGGAAGGA 
ATTCGATTGG 
ACCAGGCACT 
GACATGCTCA 
ATTGGGCCAC 
GATTGGCAAG 
AGTATGCAGG 
TGGTCATCTG 
GGGAAAGGGA 
AAGAAAGACT 
CGTCACCTTA 
CACGTGGCTT 
GGACATCAAG 
ATGTGGGTAC 
GTTGAAATGT 
TCAGCTATGC 
GCCCTGATCT 
TTTATAAATG 
GACACCTTCT 
GGACATTGAT 
AAAATAGATG 
TGTGTGTGTG 
TAGATTTTTA 
CAGACAAATT 
AATGAAGAAA 
TTGCCAATCA 
GGACTACTCA 
GTGCTGTCAA 



TTTATGGATA 
TATGCATTAT 
TATTATTTCC 
TTTCTAAGGT 
ATCCCAAAAT 
AATCACTGTT 
TGACTCTATT 
ATAAACTGCA 
AAAGAATAAT 
GACTCTGAAG 
ATTGTCTCTT 
TCTCCCCATC 
AGACAGGACT 
GGATAGGATT 
TAATTTATTT 
GGAAAGGCTA 
CCTCACCAGA 
ATAAAATTTT 
ACAAAGACAG 
AGTTAAACTT 
TCTCCTCAAT 
ACAGCCAAAT 
ATGCCTCATC 
GAGAAACAAA 
AAAGCTTAAT 
CACGCACTGC 
GTTATGCTGT 
CCATGCATCT 
GTAATGGGAC 
TGGAGAGGTG 
ATTTACAGAA 
AAGGCTAGGC 
TCTCTTCTAT 
CAGCATGTCT 
AGTAGGCATG 
TTCTGCTTTT 
AATTCTAATG 
GGAAAAATTA 
TCTGTGTTGT 
ATTTCTTTAC 
CAGGTGGAAC 
TGATCTTTCT 
TAGTTCCTCT 
AGTAGCAGAA 
CTGCTATACT 
TTAAGCACAT 
GTAAATTCTT 
TGTTTTTTCT 
GAGACAACTT 
AAATACTCAG 
TCATTTGAAC 
AATATAAAGT 
TTTTAAAAAT 
GTGAAACCCT 



TATTGATAAT 
ATTATAGATT 
TAAGGATAGA 
TCTTAACATA 
AATATGAAAA 
AACCTAATAG 
AGTGAGGGTC 
CTGCAAAATG 
ACCTTAGGTT 
GGATACTAAA 
ATAAACATTA 
CCCCCAAATT 
TATAAGAGGT 
GGTGGCCTTA 
AAAAGAAAAA 
TGTGAGCTCT 
ATCCAGCCAT 
GTTGTTTAAA 
CATCATTGCT 
GTCCAAGGTC 
CCAAGGCCAG 
GTCCACACCC 
TTGAATAAAT 
AATGCAAAGT 
GATATCCTTA 
TGATAGACTG 
ATGTTTACTT 
ATTTCTTCAA 
AACTGCACAT 
CTACCCACTA 
ATTCACTTAC 
TGTTTTGTTG 
CATCCTGTGT 
AGGGGTCATA 
TACCATTTTA 
CTCTGACTAT 
TTTTTCTCTC 
TTTCAGTCAT 
TAGCATTATA 
TTACATATAA 
AGAAATAAGA 
TCTGGGTCAG 
CTCCTCTTAG 
ACCTACTGAA 
CTCTCAGTGA 
AATAATTGTT 
TAGTTTAGAG 
GTGTCTCTCA 
TTACAAAACA 
TAGTTGAACC 
AGAGTTAAAG 
TCAAAAATAG 
GTTAGATATC 
GCTAATCTCA 



TATGTATTTG 
ATATAGCTCA 
CTTCATGAAG 
TACATTGCCA 
TTCCTGTTTT 
TCCTTCAAAA 
ATTCTTCCCA 
ATAAACATGA 
AAGGCCACAT 
CTGCATTTAG 
TAACTACTCT 
CATATATTGA 
AATTAAGGTT 
TAAGAAGAGG 
AAAAAAAAGA 
CACAGTGAGA 
GCTATACCCT 
CCACACAATC 
GTCACTTACA 
ACAAAAGCCA 
GACTCCTCCA 
CAGAGTCAGC 
ATGTTCTAAC 
ATGTAGAAAA 
TAGTCTTGGA 
TAAATTGGTC 
TTTTTATGGA 
TGGGTATGCA 
GACAGTCAAA 
AACTAATATT 
AGTGGGTTAC 
GGGGCTGGCA 
TAACCATCTT 
TCTATGTTCC 
ATGCACACCT 
TCTGTATTCT 
CTTGCTTTCA 
CCAGTAATGA 
CATGTTAAGC 
GTACTTATAT 
TTACCTAGAT 
GTACTCCCAG 
ACATTTTCCA 
CAAATTATTC 
TTTCCCTGCC 
GTCATTGCTT 
ACCAAGTAAT 
GCCCTGTAAT 
TGGAATTATC 
AAAAAAAGCA 
TTAATCGTAA 
TGCTTGAAAA 
AGGAAAAGCC 
CTGAACATGT 



TTATTGACTA 
CACATCTTTG 
TGGAAATACT 
AATTGCTATT 
ATAGCACTCA 
GAAAAAAAAA 
TGTTTCTTGT 
TATCAATCAT 
AAATATTTAT 
CTGCATGCAA 
TTGAGAAAGT 
AGCCATAAAC 
AAATGAGGTC 
AAGATTCTGC 
GGAAGAGAGG 
AGGTAGCACT 
GCTCTGAGAC 
TATGGTATTT 
GACAAGAAAA 
GAAACAAGTG 
CTCCACATGT 
ATTAGACCAA 
AACTTACCCA 
CTATGTTTAC 
GGGGTTTGTA 
CTAGAGAGAA 
AACATATGAT 
CAGTTGAGCT 
AATCTCAGTC 
TGTATATCAA 
CAGAAGGGAT 
GGAGCTGTCT 
CCATGTATCT 
ATGCAGGAAA 
TGGTTTTCAG 
GGATTACAAC 
AAAACTGACT 
GCTGTTCATA 
ATTGAATAAA 
ACTTATAGCT 
GTTTCTCCTA 
AACTTCCTAA 
GGACTACAGA 
AGGCTCATCT 
TTGGGGTCAA 
ATGTTTGGAT 
ACTTACAAAA 
AGCATCGTAC 
TACATACCCT 
GTTCAAATAA 
AATAATGTCT 
AGGAAGAATC 
AAGAAGTGAG 
AAAAATCTGT 
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AGATGCCTTT 
TAAAAAAAAC 
TATCACCGGA 
CCATGACTTT 
AAATATTATT 
CATGCCTGTA 
TTCAAGAGCA 
TGAAAAAGGA 
CTTCATCTAT 
GACTGGCACA 
ACCATTTTAG 
ATGTAAGAAT 
TCTGTCAACT 
ATTTAATAGT 
CATACAGGTG 
GAGTGACACG 
TTTTAAAAAT 
TGTTTTCTGG 
GGCTGAAGTG 
TTCTCCCACC 
AATTTTTTAA 
GGGCTCTAGT 
CCCTACCTGG 
TGTATGGGTA 
AAGTTTGAAG 
CTTTACCACA 
ACTCCCAACT 
CCCAAGACAA 
GCGCGGCGGC 
AGGTGGGGAG 
CAAAAAATTA 
GCAGGAGAAT 
CACTCCAGCC 
TAATATCTAC 
GTTTTCCTTT 
GGCCTCAGCC 
AGAAGCTCAG 
CCAAATCTGC 
GGAGTGCAGT 
AGAAGCTGGG 
GGACCAGGCC 
AGGCATGAAT 
TTCTAGTAGG 
GGACTGTGTC 
TAATGAAACC 
GCCAATCTTA 
AGCTACAGGG 
TTTATTAGAT 
AGCATCTTCT 
TAAATAATTT 
TATCCATTTG 
TAATTTACAA 
GCTTTCTCGA 
CAAATATGAA 



ATTTTATTCA 
AAATTAGAAT 
GATAAGAATT 
GCTACTTAGA 
TCAATAATGA 
ATCCCAGCAC 
TCCTGGGCAA 
AGACTGAATT 
AAAGTTAATT 
GAAGAAGCAC 
CTATCTAATG 
TTTCTAAATT 
TGTAAATATA 
ACAACACTCA 
CTCAGAAAGA 
GTGCTTTAGT 
GAACTAGTCC 
AAAAAAAAAA 
CAGTGGCACA 
TCAGCCTTTT 
TTGTAGAGAC 
GATCCACTAG 
CCTGTTCCCT 
TAACAGAGAG 
TCTTATCTTT 
CTGTCCCCTT 
TCTGACTGTG 
CTACAACAAT 
TCACGCATGT 
TTCGAGACTA 
GCCAGGCATG 
TGCCTGAACC 
TGGGCAACAA 
CAAATGTTTC 
GCTGAGACCC 
TTTGTGAGCA 
GGGAGCACAC 
CAGCTCAGTT 
AGCTGCGACC 
ACTGCAGGCA 
AACCTAGTCT 
CACTGCGCCC 
TTCTTGAGTC 
TGTTTATCTG 
AGAAGCAAAA 
CATAATGTGT 
ACTTGGGAGC 
TGCACATGCC 
GACTCCGCAA 
TAAATAAAAA 
GAAGACCACT 
GGAAAAGGGG 
ATAGTTTTGG 
TTTCCGCAGA 



CTCACACACA 
GTAAAATTAA 
TATTATTTTT 
AGTTAGAGAT 
ATGTTTAGAA 
TTTGAGAGGC 
CACAGCGAGA 
TCCTTTGGGC 
CCTACATTTT 
TATATACTAT 
CAAAATATGA 
CTCTAATTCT 
AGGATCAACC 
GAAATTATCA 
TGCACCTGTA 
GAGTTGTGGA 
ACAGTAGAAT 
AAAATTTTTT 
ATCATGCTCA 
GAGTAACTGG 
AGGGTCTTGC 
CCTCAGCCTC 
GAATTTTTTT 
ACAGAGAGAA 
TGGCTTTTGT 
AGGCAAGGTC 
GGCCCTTCTC 
ACAACAAATT 
ATTCCCAGCA 
GCCTGGCCAA 
GTGGTGGGCG 
TGGGAGGTGG 
GAGCAAAACT 
ACACAAGTAT 
TATGCTCTGG 
AGCTCTTATC 
TGGACATTAT 
AATTAATTAA 
TCAAGCTCCT 
TGTGCCACCA 
TGAACTCCTG 
AGCCAACCCG 
TAGGGTTCCT 
GGGATGTAGG 
CTCAGTTGAG 
GAGATCTTGA 
ACCTTTAATT 
TAAATAAAGA 
TTAGACAGCT 
TCATGGCGTG 
CTGAAGAGAT 
AAGTTTTGTT 
CATCCAGGGT 
TTATTCAGCA 



TATGTAGAAA 
TACTTTAAAA 
AAAATAAAGT 
GCCAAAGTTT 
GACTGAATTT 
TGAAGAAGGA 
CCCTGCAGCA 
AAGTCATGTG 
TGGGGAAGGG 
ATATATGTGG 
ATCTTTTTTT 
GTGTTAGTTT 
TGATCCACAA 
AAGGTCAGAG 
ATCTCTCTAA 
ATCAATCTCA 
ATACTAAAGT 
TTTTTTGAGA 
CTGCAGCCTT 
GACCACAGGT 
TATGTGCTTA 
CCAAATTTAT 
TTCTTTCAGG 
AGAAACTTTT 
TTCAGAAATA 
TTTGCCATTC 
AAAAATGATT 
CTCTGCTTAA 
CTTTGGAGGC 
CATGATGAAA 
CCTATAATCC 
AGGTTGCACT 
CTGTCTCAAA 
TTGGGGATCT 
CCACACTAAA 
TCCAGGCCTC 
TCCAACAACC 
GCAATTCAGA 
GGGCTCTAAG 
CACCCAGCTA 
GCCTCCAGCC 
CCCAGTCTTG 
ACCTCATGTT 
GGTGGGCAGG 
GACACCGGTC 
TATTACCCCA 
ACAGACAACC 
CATCCTCTGC 
AAGAGATCTG 
AATAATTTCT 
GAAATAAGTC 
CCTCTCCGTG 
CATTTTTCAT 
CTAGACCCTG 



GAGAAATATA 
AATGGGCTGT 
TATTTTCTCT 
ATCTAAGAAA 
CCTGACTGGG 
GGATCGCTTG 
AAGTAAAAAG 
ACATTCCTGT 
AGAGAAAAAC 
ATATCATTTG 
TCTGGGTCTT 
TAAAGCAATG 
TTTGACCCCT 
AAGCCAAACA 
GGAGAAATAT 
TGATTTCCAA 
GCTGGTGCTT 
CAGGGTCTCG 
GACCTCCTGG 
ACGTGCCACC 
GGCTGGCCTT 
GGGATTATAG 
TGTTTGTGCA 
' CTATCACACT 
TTTCAAATGT 
TTCTGAGACT 
GTTTATGCAA 
AAACTTCCAA 
AGAGGCGGGC 
CCCCATCTCT 
CAGCTAATTG 
GAGCCAAGAT 
CCAAACCAAA 
TCACAAATGG 
CTCATTCAGC 
TCACAAAGAC 
CTTTCCCCAC 
GATGAGGGTC 
TGATCCTCTT 
ATTTTTTTTT 
TTCCGAAGTG 
TTAGACATGG 
TTATAGTTAA 
GGGATAGAGG 
ATGAGAGTGG 
TCCTTGAGAG 
CATGTTCCTG 
AGTCTTTTGA 
TGTTACTTCC 
TTCCTCTACC 
TTCTGCCAAA 
AATTTGATTG 
TAAAAAGAGA 
GGAGATTCTG 



TGGTAAACAT 
ATACTTTTCT 
GTGACTGTTT 
ATGTTTATGG 
CACAGTGGCT 
AGTCCGGGAG 
AAAAAAGAAT 
GCCTCAGTTT 
TTAGGATAGT 
TTTTTATGGT 
AAATTATGGA 
GAGTAACGTA 
AGCCACTAAT 
AATGTAAAAA 
TTTCCAAACT 
CCTAGTGTTC 
AAGATAGTAT 
CTCTTGCCCA 
GCCCAAGTGA 
ACACCCGGGT 
GTGAACTCCT 
GCATGAGCCA 
TATGTGTGTG 
TTGCAATCAG 
AGACTCTCTC 
ATTGCAACAG 
TAAATCTAAA 
TGTCTGCCGG 
AGATCACTTG 
ACTAAAAATA 
GGAGGCTGAG 
CACACCATTG 
ACAAAACTTC 
CCCTTATGGA 
ATCCCAGAAA 
CTGTTCCAGT 
AGCTATGCAG 
TGCCCAGGCT 
CAGTCTACCC 
TTTTCAGTAG 
CTGTAATTAC 
GGTCTGTAGT 
TTTAGGGGAG 
GGACTTCAAT 
CCTGATTATG 
TCCTCTATAA 
TGGATTATGA 
CAATTCTATA 
CTCACATATA 
GATTTGAAGC 
GATTACTTAT 
AAAATCGAGG 
AAAGTCATGT 
TAAAGAGGGG 
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TTTTGTTATA 
GGGGCGGTGC 
GCCGCCCGGG 
GTCTGAAACC 
AACCAAGAAG 
CCTCTCTGTG 
GTCTTTGGTT 
CAGCCGCATC 
GGGTACTGGT 
AAGCAAGGCT 
CAAGTCACCA 
TAAAACTGTT 
CCCAGTGAAG 
AAAGGCCACA 
GCTCTTTTAA 
TGACAGTTAT 
GAATTAATTC 
ACCAGGGGTC 
ATGAATTGAG 
GTCAGGTTAG 
CATGCAAGGG 
TGGAGCAGTT 
TAATTGTCTT 
AAAAAGTAAA 
AAGGATACAG 
CGTTGTCTCA 
ACTGCCGTAT 
GATGGTGCAC 
TCAGGGAAGG 
TTTTTTAACA 
TAATATGCTA 
TGGATTAGGG 
GTAGCTGAAA 
GGTTATAGTT 
TATATCAACA 
GGTCTAATAC 
GTAAGTTTCA 
TTAATGGAAA 
ACTTGTTTTA 
TTTTCCTTTG 
TTAGTCTACT 
TGCATTGCAT 
TCTGAACTTA 
GTAAACATTA 
ATGTAATTGC 
TCACTTGCCT 
TCACCAGATT 
GGTCTCTCAA 
TTGTATGTAT 
TTATGGTTCT 
GGATGGAGCC 
AACCTCTGCC 
CAGGTGCCCG 
CTTGGCCAGG 



CTCAACTTTT 
CTAGGTGATG 
TGTTTCATGC 
GTGCCTGCAG 
CGAGGGAGGA 
TCCAAGTTGA 
GCGCTCAAGA 
AAACTGTCCC 
GCTTCCGGTT 
AAAAAGTCAG 
AAGACTGCTA 
AGGAGCGGGA 
GCAAGGGCTT 
TCTAAGAAGT 
GAGCCACCCA 
CTATAGGTTT 
AGGCCAGGCT 
CTCATTCCCC 
CCGCACAGCT 
CTGCAGCATT 
ATCTAGGTTT 
TTAGTCCGGA 
ACACAAAACG 
TTAGTCAAGC 
TGAGCTATGA 
AAACTTAAAA 
ATCTAGAGGT 
TAGAGGAGGC 
GAAGTGGAGA 
CAAGTACTAC 
TCCACTGACT 
GAGTTTTTTT 
TTTAGAATTT 
GTTGCAAGAA 
AAAACACACA 
AATTGTAACC 
TCAAACTACC 
TGTTGGAGGC 
ACTCTCAGTC 
ATTTTTGATT 
TTGGACCATG 
TATTTCACAT 
TTTGTATAGT 
TGTTTTAAAT 
CATTGTTTCC 
GACATAGCAG 
CTCATCACAT 
TAATATGGGA 
ACAAGGCTAG 
TCCTCCAGTT 
TCGCTCTGTC 
TCCTGGGTTC 
CCACCACGCC 
CTGGTCTTGA 



CCGGGTAAAA 
CACCAATCAC 
TTTTCGCTGG 
CTTCTGCCAG 
AGCCGGCTGG 
TCACCGAGGC 
AGGCATTGGC 
TCAAGAGCTT 
CCTTTAAGCT 
TTTCTGCCAA 
AAACCAATAA 
GAAAGGCTAA 
CGAAGTCAAA 
AAAGAGCTTT 
CATTATTTTA 
AAGTTGTGAT 
TCAAGACCAT 
CGGCCACCGA 
GAGGGGTGAG 
AGATAGATTC 
CAGGCTCCTT 
AATCATTGCT 
GTCTCTTGTG 
ATGGTTGGCA 
TGGTGCTACC 
AAAAAAAAAG 
CCAGGAACTA 
TTTTACATGT 
GCAATTTGGC 
ATTCTAGTCT 
TCAAGGGATC 
TTTGTTGTTG 
TCTTCCATTG 
TCTGGAAATC 
TATTACGGTC 
CTATGAATTA 
AGAGCATACC 
TGCCCAATTA 
TGAGGTAAAC 
TAGTATTCTT 
GTATTTCGAG 
ACATTTATGT 
TTGGCATCTT 
TTGTATAGAT 
CAATGAGTTC 
AGAGCCATTT 
ACAGTGAGGA 
CTCCCTCAAG 
CATGCCTGGC 
CTGGGGATTA 
ACCCAGGCTA 
AAGCAGTTCT 
CAGCTAATTT 
ACGCCAGACC 



CAAACACAAA 
AGCGCGCCCT 
TTATTACATC 
TGCTGGTCTA 
CTTGATAAGT 
CCTTTCAGTG 
CGCTGCTGGC 
AGTGAACAAG 
TAGTAAGAAG 
GACCAAGAAG 
GAGAGCCAAG 
AGGAGCCAAG 
ATTGACCCAA 
CCGGGAGGCC 
AGATGGCGTA 
GCAGCTGAGT 
CCTGGGCAAC 
CCGGTAACCG 
CGAACATTAA 
TCATAAGCTC 
GTGACAATCT 
CCCAGCCCCT 
TCAAAAAGGT 
CGCTCCCTTA 
TCACTCCAGC 
TTAAAACAGA 
AAAAGTCTGA 
AAGAGCATCT 
ATCCAAACAT 
TTCTGTGGTG 
AATAAATAGG 
TTGTTGTTGT 
TGTGTGACTG 
GTGCTTGCTT 
AAGTGGTTTG 
CTTTAAGTAT 
GAAGACTGAA 
GGTTCTGAAT 
TACGTTTCTC 
ACTGATCATC 
AAACTTTGAA 
TTTCCAGACG 
TTTAAAAATT 
AAAATCAACC 
GGAATTACTA 
TGCCTAAATG 
TGAACAACTA 
ATGGCTTCCT 
ATACATAAGG 
TTAGACCACT 
GAGTGCAGTG 
CTGGCTCAGC 
TTGTATTTTT 
TCGTGATCCA 



TACTCCTCCT 
ACCCTATATA 
TTGCGTTTCT 
GCCGCTATGG 
GCAAGTCGCA 
TCACAGGAAC 
TACGACGTAG 
GGAATCCTGG 
GTGATTCCTA 
CTGGTTTTAT 
AAGCCGAGAG 
GGTAAGCAAA 
CATCATGAAG 
AATTTGGAAA 
ACACTGGAAA 
TGAAAAGGCT 
ATAGCCAGAC 
GTCCCTGTCC 
CCAACTGAGC 
AAACTGTATT 
AATGCCTGAT 
GCACCCCCTG 
TGGAGACTAC 
GTCCCTGCAC 
CTGGGTGACA 
AAAAGGGCTT 
TGTCCAATCC 
AAGTTCTGGA 
AACTTGCTGA 
TCATTGTAAC 
AATCAAGGTG 
TTTCATCTAT 
ATAGAAATAA 
ATTTCCGAAG 
ATAATTATTT 
CTTATTTATG 
AAATTTTAAG 
TCCACCTTCC 
TTTAAACAGA 
ATAAATAACC 
CAAAGTCCCC 
GTTCAATAGT 
GTGTCCTATA 
ACAGACCTTT 
GGATTGTGCA 
CTGTGCCCAG 
GCCTCTCCCA 
GCACCTTTGC 
TTAAAAACAA 
TTTTTGTTTT 
GCACAATCTC 
CTCCCACGTA 
AGTAGACGGG 
CCCACCTTGG 



CCAAGGGGCG 
AGGCCCCGAG 
CTGTTGTTAT 
AGAAACTTCC 
AAGTGCCGAA 
GAGTAGGTAT 
AGAAGAATAA 
TGCAAACCAG 
AATCTACCAG 
CCAGGGACTC 
CGACAACTCC 
AGCAGAAGAG 
TTAATGTTAG 
GAACCCAAAG 
CAAGTTTCTG 
TGAGATTGGA 
TACCATCTAT 
ATGGCACGTT 
TCCACCGCCT 
GTGAATGGCA 
GATCTGAGGT 
GTCCGTGGTA 
TGGTTTTACA 
CCAGGCGTTT 
GCGAGTCAGA 
CTTGTCAGAG 
TGAAAAGCTC 
AATG CCAGTG 
TACTTTTTTT 
TATTGTTTCT 
TCCCAGAATA 
TCATTATCCT 
CAAATTTGTA 
TACTATTAGG 
TAATATTATT 
AAAAGAATCT 
AATCCAAACC 
TGAATCACAA 
CATAGTTTAA 
AATGCTAATG 
TGCAAAACTA 
ACCTCACTTT 
ATGAAAGGTT 
CCTTGCTTGG 
AAAATATGCC 
CAATGGACTG 
GCAGCTGGCC 
TCCTCTAGCC 
AATCAATAAG 
GTTTTGTTTT 
GGTTCACTGC 
GCTGGGATTA 
GTTTCACCAT 
CCTACCAAAC 
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TGCTGGGAAT 
AGGACAACAG 
CATAGCTGTG 
CCTTCGGTTA 
TTTTTTTTTT 
CTCGGCTCAC 
CCCGGCTAAG 
CTCGGATTCT 
GTCGTGAGCC 
TACCTCTCTA 
ACACAATTCA 
AACGCATTTC 
TTTGTTTTTG 
CCATACAGAG 
TTGCGCTTGG 
TTAAGCACAC 
CGAGCCAAAC 
CGATGGCGCT 
ATTCCTATCG 
CTGATTGAAA 
TGGGTGGAGC 
TTGCTCTTTG 
TAATGACGCT 
AATTACGGAA 
GGTCAAGGGA 
GGCCTTCTCT 
TATGTTTTGG 
AACTTCTTCC 
CTGCAAAAGT 
GGACGCTCTT 
CTTCCCTCCG 
TTTGTACAGT 
ATCTACAGGG 
TAATCCCCTT 
GGACTTCTCA 
AAGATCCACT 
TGAATTTCTA 
ACCTCAGACT 
AGGCCAGAAT 
CTTCTGCCCC 
ACCTTATAAC 
AGTCTGAGAT 
AAAATATAGC 
CTGGAACACA 
TTTATTTTGC 
TTCTATTTCC 
ACAATGAATG 
TGGATTCTTC 
AGGCTCACTC 
CCATCAGATC 
AAACAAGTTG 
TCAGAGGTTA 
AGCACTTCAG 
GGCAACATAG 



ACAGGCGTGA 
CCATAGAACC 
TGCCGCATGA 
AGTCCAAAGT 
TTTTTGAGGA 
TGCAATCTCT 
TTTTGTATTT 
TGATCTCAAG 
ACTGCGCCCA 
TGCCTACTTT 
TTCTTATGCA 
AGCTCTTTAA 
TTTTTTGAAG 
TGCGCCCCTG 
CGTGCTCCGT 
CTCGAGTCTC 
GGCGGATAGC 
TAGCACCACC 
CAGTGGAAGG 
ACAACATGAG 
AAGAAAAACT 
TGGGAGTGGA 
AAGCATAGCC 
TAGTTTATTG 
TTTACCCTCC 
GTGCTGGACA 
GAAGGCAAAA 
TCGTCACTTT 
CTGAGAGAGA 
CCTGGTCTGC 
TTTTTAATCC 
TGGGGATTAA 
TCTTCCTCTG 
GAAAAATAAA 
ATTCAAAATT 
TTTGTTTAAA 
TTGGGAATAT 
TGCTCACATG 
GCTCTACATG 
AAATTGAGTT 
TAGAGACTTA 
CATAAGTAAA 
AGCATATGCA 
TCTCGCCAAG 
CTATTATATC 
TGCTCCTAAT 
TCAGAACAAA 
CACTCTGGGA 
GCCTAATCTG 
AAACTCATTC 
TGGTTGAGAG 
ATCTATGATA 
GAGGCTGAGT 
CAAGTCTTCA 



GCCACCGCGC 
CTCCGCAAAT 
GCCAAAAGGT 
ACCATTCTTA 
GTCTCTCTCT 
GCTTCCGGGC 
TTTTTGGTAG 
TGATACACTA 
GCAAAATGCT 
ATGCTTTGAA 
GGCTGTCACG 
ACGACTTTGT 
TTCTCAGGAG 
ACGTTTCAGG 
ATAGGTGACG 
CTCATAGATA 
CGGTTTTGTA 
CTTCCCCAAG 
TATGAACTGA 
TTGGCGCGGT 
GTTTCATTAT 
AGGGTGTTTG 
CCATTCCACA 
GGGAACATAC 
CAATCATTTT 
AGGTATAAGT 
AGGTAGCCAA 
CCCTATCTCG 
CAGGGAATAT 
TGTGCCTGTT 
CCTTTCAACT 
TTGAAGTGTA 
GGAGGTTTTT 
TAGAATAGCA 
TTATTCTTAG 
AAACAAAAAC 
TTAGAATGGG 
AAGAGAAGAA 
TCATATTGTT 
CTTAGGTTCT 
GCTAGGAAGA 
ACTCTGAAAT 
ATATGATAAT 
TGCCATCTTC 
ATTTATAAAA 
ATCTCCTTTC 
TATTTAAAGG 
AGAATTCAGG 
TATGGCTTCT 
ATTGAACAAG 
GATACATGAA 
TTAATCAGAG 
TGGGAGAATC 
TCTCTACTTA 



CCGGACTTAG 
GAGAGCTTGT 
GATAACCTTT 
GAATGCTCTA 
GTCTCCCAGG 
TAGCTGGGCC 
AGGGGGTTTC 
GCTTTGGCCT 
TTTTGTGGAG 
ATTTTGTCAC 
GTTATTTCTG 
GAGCGGCCCT 
ACCGCGTATT 
GCATATACTA 
GCGTCTCGAA 
AGACCGGAAA 
* ATGCCCTGGA 
CCTTTTCCGC 
AACAGTTCCT 
TTTTTTTTTT 
GGTTCATTGT 
CAAGTTGAAT 
TTTCTTTTTA 
AAATAATGTT 
AATATTTTTA 
TTGGCTATGA 
ATAATTGCAA 
ATTCAAATAT 
AAACTTAAGT 
TGCTGTGCCT 
TGCTACAGCT 
GGGCTAATAC 
GTGATAAGAT 
GAATTGGGTC 
CTTCCTGTGG 
AACCCCACCC 
GCTGTGGCCT 
ATCCAGGAAT 
TGTATCACTT 
TCCACTCACT 
AATGTCAAAC 
CTCAACATGC 
TCTCTGAAAA 
ATTTTAACCA 
CCCCATTTTT 
TAAACTTTTC 
ATCTGTACAT 
CATACTCAAT 
CGTTCGCTTT 
AGACCTAAGC 
GCATTCAAAC 
GTTAATGCAG 
GCTTGAGCTC 
AAAAAAAATA 



ACCACTTTGT 
CCCTAAAGAT 
GTTCAACACG 
AAATACATAA 
CTGGAGGGGA 
TACAGGTGCA 
ACCATTTTGG 
CCCAAAGTGC 
CCAATCACTT 
AGTGGGGCCG 
TCATCCAAAC 
GAAAAGGGCC 
CTTAGATTCA 
CATCCATGGC 
TAACGTTCTC 
TGCGCTTGAC 
TGTTATCCCG 
CTTTGCCGCG 
TAAATACAAA 
TTTCAAATTT 
TTTGATTGGC 
GCGCTGTATT 
TTTCCACTTG 
TAAAGGAGGT 
TTTAAACCAG 
AGTTTCACTC 
ATTAAAACCT 
TTGTTGAATG 
CTGGATAATA 
GAAATTCCAA 
TTAGAGAAAA 
TTGATTAAGG 
TATTGGTGTT 
TGAATGTGGT 
GAGCTTTCCA 
ACCACTCTCT 
GTGAGAGACA 
GGAGAAAAAA 
CTGAAATAAT 
GTCCACATGC 
ATTACAGAGA 
CTTTTAATTC 
CATACATCAT 
GAGGTCTAGG 
ATTTTGATAT 
TCAATGACAG 
GTAGATATAT 
CTTATGGTTA 
CCATTTCACC 
CCTTCAGATT 
AAATAAATCT 
TGGCTCACGG 
AGGAGTTCAA 
ACCAGAGGTG 



TTTGGCCAAT 
GCTTTATTTA 
CGCCTCCAGC 
TTTTTTTTTT 
GTGGCGCGAT 
GACCACCACG 
CCAGGCTGGT 
TGGGATTACA 
TATTAGCGCT 
GTCATGGCAA 
TCATTCTCGC 
TTTGGGTTTT 
GCCGCCGAAG 
TGTGACAGTT 
TAAGAAAACC 
GCCACCGCGC 
GAGCACCTTA 
ACCAGACATG 
CTTGGCGGAC 
GGTCACCGAG 
CAGTGACAGC 
CCTGTCAGCT 
CTAACTAATA 
CAGATTTATA 
GCATTTTGAT 
CTAAAGACCC 
CATAAGTGCA 
ACTCATTTTT 
TGTTTTCCCG 
ACACTCTTCC 
GAACATACGT 
TCATTACAAA 
AAAATAAGGC 
TTGAAGAAAG 
GAATGCCCAT 
GGTTAATAAA 
TTATATAGTA 
GACCCAGGAA 
TGATTACATT 
CACAACACAG 
AAAAATGCAG 
ATGAAAATAA 
GTGAACTACC 
ATGCCTTTCC 
TTTATTTACT 
TGACTCAAAA 
ATATTTAAAA 
GGGAGAGATT 
TTCCTCTCAC 
AAAACTCTGC 
ATGATATTAA 
CTGTAATCCC 
GACCATTTTG 
TTATGAAAAT 
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ATAAATTGTC 
TGAAATATGG 
TATATATGGC 
CTAGCAGCCC 
AGAAATCCCA 
AGTCTCACCC 
TTGGGGACAT 
CACTCCCATC 
ATCCATTTTC 
GTTCCGTAGC 
AGAAAAAAAT 
TTGCACTTCC 
CTTCGATTTG 
TTAAAATTTT 
ATGGTAGCCT 
GGGACTACAG 
TTGAGATGGA 
CCGCAACCTC 
GATTACAGGC 
CTCCATGTTG 
CAAAGTGCTG 
TAGAGATGGG 
TGCCCACCTT 
AAAGATAATT 
AGAGTAATTA 
GCACAAAGTA 
AAAACCGTAA 
TGGGGGCAGT 
AGTGGCTCAT 
AGGAGTTCAA 
ATTAGCTGGG 
GAATCGCTTG 
AGCCTGGGAG 
TGACTTAGTT 
GTGACTTGCA 
ATTAAAGACT 
TAAATAATAA 
GCTTCAAGGC 
TCAATTCAAA 
ACGCTGAGGT 
CAGAACCCGT 
CAGTAGGCTG 
ATCGCAGTAC 
ACAAGTTAGA 
GCCAAAAAGG 
AACTCCATCT 
GTCTCAGCCA 
CAGTGAGCCG 
GAGAAAAAAA 
CTGAAAGATA 
TCCTTTTTCC 
GCCACACCGT 
CTAGAATAAA 
GGTTTCATAA 



CAGAACTACC 
TGTGTGTGTG 
ACCTATATAT 
AGCATACACT 
GAGTAGAAAT 
AACTTAGACT 
CACCACTGGT 
TAAGGCTTCA 
TGATTCATTT 
ATACCTGTGT 
CTCTGCTTTT 
TAGGCTTGCT 
TTAAAAATAA 
TAATGTTTAT 
ACACTTCCCC 
GTGTGCACAA 
GTTTCGCTCT 
TACCTCCCAG 
ATGCATCACC 
AGGCTGGTCT 
GGATTACAGG 
CTTTCCCTGT 
GTCCTCCCAA 
TCTAACATTA 
AATTTGATTT 
TTTTACATTT 
CATTATACCC 
AGTGAAGTTG 
GACTGTAATC 
GACCAGCCTG 
CGTGGTGGTG 
AACCTGGGAG 
ACAGGGCGAG 
TTAAGGAATA 
CTGAAAGTTA 
CCAAAATTCT 
AAAAATACTT 
TATCCATCCC 
AGTTAGAAAT 
GGGTGGATCG 
TTCAATAAAT 
AGGTGGGAGG 
TGCACACCAG 
AATTTGGCTG 
GCGGATCATT 
CTACTAAAAA 
CTTGGGAGGC 
AGATCATGCC 
AAAAAATTCT 
TTTCAAAATA 
CTGCAGCAAA 
ATGTTTCCTT 
AATCCAGCAC 
CAGCAACATA 



CTCCACAAAC 
TGTGTGTGTG 
TCAACAAACA 
ATCAGTTTTA 
ACTTTTAAGC 
ATGGGGGCTT 
CTTGGGCAAG 
CTGCATTTCT 
TTTTCTGAAT 
CTCTGCTGTG 
TCTTTTCAGT 
GTCCTTGTGT 
AGATATCTGG 
TTTTTTCCTA 
GGGCTCAAGT 
CCACACCTGA 
TGTTGCCCAG 
GTTCAAGCAA 
ACGCCCAGCT 
GGAACTCCTG 
CGTGAGCCAC 
GTTGTCCAGG 
AATGCTAGGA 
TCCTCTCTTA 
TCAAAATTCC 
GTTTTAATGA 
ATACTTAAAA 
GTTATTTACT 
CCAGCACTTT 
ACCAAAATGA 
TGTGCCTGTA 
GCGGAGATTG 
CTCCGTCTCG 
AATCAAGGAT 
TACGAATATT 
TTTTAGAATC 
TGTATCTAAA 
CAAATTTCTC 
TTGGCCGGGC 
CATGAGCTCC 
AATAGAAAAA 
ATCACTTGAG 
CCTTGGTGTC 
GGCGCGGTAG 
TGAGGTCAGG 
TACAAAAAAA 
TGAGGCAGGA 
ACTGCATTCC 
GTATGAACTG 
TTTAGGAAAA 
CATTAGGAGT 
GGCTCAGACA 
ATCATTTTCT 
AGCATAACAG 



TAACTCTCTC 
TATGTGTGTG 
ATTCTGATAA 
AGTATATAAT 
TATATTACAG 
TATAATGTCA 
AAACTCCTCT 
CTTTTTCAGC 
TAAACTGTCA 
GTTTTTTTTA 
TTAAATTATT 
GGGCACGCTC 
ACAGAAAATT 
GACTGGAGTA 
GATCCTCCCA 
CTAATTTTGT 
GCTGGAGTGC 
TTCTCCTGCC 
AATTTTGTAT 
ACCTCAGGTG 
CACGCTCGGC 
CTGGTCTTGA 
TTACTGGCGT 
AACATTTGTT 
CTTGAATACT 
TGAAATTGTG 
CAGATGCCCT 
GTTTTATGAA 
GGGAGGTCGA 
TGAAACCCTG 
GTCCCAGCTA 
CAGTGAGCCG 
AAAAAAAAAA 
ATTTAACTCA 
GGTACTTATT 
TTCAGAGTAA 
TCTGGTGTAT 
CCTGAATGAT 
ACGGTGGCTC 
GGAGTTCAAG 
AATGAGCCAG 
CTCAGGAGGT 
AGACTGAGAC 
CTCACGCCTG 
AGTTCGAGAC 
CTTAGCCGTG 
AAATTGCTTG 
AGCCTGGGTG 
AACAAAATAT 
AAATTATAGG 
GCTGCTGTTC 
TAAGGTTGTG 
TCAGCAAGTT 
AATAGCAGCA 



AGAATATTCG 
TGTGTGTGTG 
TTGGCCAGGG 
TGCGCTTTAG 
GTGAGAAAAT 
CAACAGTTGT 
AGCCAATGGC 
AACCTAACTT 
GTACCATTGG 
CCTCCACTCC 
TCACAAAAAG 
CCATAAACAC 
TCTTTTCTTT 
CAGTGGCACC 
CCTCAGCCTC 
TTATTTGTTT 
AATGGCGGGA 
TCAGCCTCCC 
TTTTAGTAGA 
ATCTGCCCGC 
CACTAATTTT 
ATTCCTGGGC 
GAGCCACCAG 
TCAAAAATTT 
TTCTTAATAG 
AACCCAAACT 
CATATACATA 
AGTGCCATTC 
GGCAGGCTGA 
TCTCTACTAA 
CTCAGGAGGC 
AGATCGCACC 
ACAAAAAAGT 
ATAGACTACA 
CCCCTGCCCC 
AAGCTAGAAT 
AAAATAACTT 
AAAGAGAATA 
ACTCCTGATA 
ACCAACCTGG 
GCGTGGTGGT 
CGAGACTGCA 
CCTGTCTCAA 
TAATCCCAGC 
CAGCCTGGCC 
CATGGTGGCA 
AACCCAGGAG 
ATAGAGTGAG 
CCTTAAATTT 
GATCAGGCAA 
CTAAAAACAT 
TAGTTGTTAT 
AACTAACCTC 
ATAGCTCCTA 



ATATGAGGAA 
TGTATGCACC 
TTGAGAATGA 
TAAAATGTAA 
GCATAAGTAT 
TTCCAGGCAT 
TGATTTATCT 
ATTTAAAAAT 
CACACCTTTG 
TTACTTTTCT 
TTTTCTTGAC 
TATTAATACA 
TTTTAAGATT 
ATGATGGCTC 
CCAAGTAGCT 
GTTTTGTTTT 
TCTCGGCTCA 
GAGTAGCTGG 
GACGGGGTTT 
CTCGGCCTCC 
GTATATTTTG 
TTAAGTGATC 
GTCTGGCTGG 
TACAAACATG 
CACACAGAAA 
TACACAAAGA 
GTAAAACTCT 
AGCCGGGTGC 
TCACGAGGTC 
AAATACAAAC 
TGGGGCAGGA 
ACCGCACTCC 
GCCGTCATAG 
GTTAGCTAAC 
TGAAGTATGA 
TTGATTTTTT 
GGTGGATGAT 
AATGAATATG 
ATCCTTTCGG 
GCAACATAGC 
CCCAGCTACT 
GTGAGCCGTG 
CAACAACAAA 
ACTTTGGGAG 
AACATGGTGA 
TGCGCCTGTA 
GCAGAGGTTG 
ACTCCATCTC 
TAAAATACAT 
ATTCTGAGAT 
GGTAACTGTT 
TCCAGAATAG 
TCTGTGCCTT 
CCTACCTCAT 
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AAGATTCTTT 
AATAATTGGC 
TTGCTACATT 
AAAAGTTTTT 
AGAAATAGAA 
TAAACAATTA 
AATGCATGCA 
TTAATATTGT 
GGAAAAGAAC 
GCATGTGAAA 
AGATTGTGGT 
AGAGACAGTG 
GACAGTATTA 
AGCAGTCCTC 
TCTGTTGTGT 
GCTGGTAAGT 
TTTCCAGTGC 
GCTGCATCTC 
CTTCTCTTTA 
GCAGGTCTTT 
ATTAAGCTTA 
" CAGGCCACAG 
GGGTGTTTCT 
TAATAAATTA 
GTCTGCTTTC 
GGTTTGCTGG 
AAACACGATT 
GAAATTAGAA 
TGAACAAGAA 
ATGCATTTTT 
ACTGGCTAAT 
TAAATCACAT 
AAAGAAGAGA 
TATATGGTAT 
CCAAGTGCAT 
TTTTGACGGC 
AGGCGGATCA 
CTCTACTAAA 
TTGGGAGGCT 
GATCATGCCA 
TTGAACATGG 
TTTTTTTAAT 
CCACTTGGTG 
GGAAATACCA 
AGCCCAGCAG 
ATGAATGCCA 
GGACTTTTAA 
CATGTGATTC 
CTTCTCTATT 
ACGGTCAGAT 
ATACTTCACT 
GTCTCACTCT 
GCCTCCTGGG 
TCTGCCACCA 



GGAAGAATTA 
TAAAAAAATT 
AATATATTGC 
GAAAAGATTT 
AATATATAAG 
CTCAGAAATA 
GGATGTCAAC 
AGTAACATTC 
ACATTATACC 
ATCTTTAATT 
TTGAAAAAAA 
TTTTGTTTTT 
TAAGATGACA 
AATCACCTGC 
TCCCAGTTCA 
AGTTTCTTGT 
ACGCCCCTCC 
TGGTCACCGG 
CTACCATGGT 
GATTTTTCAA 
AAAAACACCG 
TTGCTGATGT 
TGAAATCTCA 
ATTGTAACAT 
AGTATAGTAA 
TAAAATAACC 
AATTCGGCTA 
GCAAAATAAA 
CTTCAATAAA 
AATGCAACAA 
TAAATAACAG 
TTACTTTTTT 
TGAGATATCT 
CCTGAAGCAC 
GTAGTAACAT 
TGGGCAGGGT 
CGAGGTCAGG 
AATACAAAAA 
GAGACAGGAG 
TTGCACTCCA 
TGAACTGATT 
GTGCACCGGA 
GCTTCCATTA 
CCAGAGTTCT 
GGCCACTAGC 
AAGAGAGCAA 
AGGAAACATG 
AAGCTCATTC 
TATTCTAGGC 
TGACTGAGTT 
TTCTTTTTTT 
GTCACCTAGG 
TTCATGCCAT 
CGCCCAGCTA 



AATTAAGATT 
TTCTTAAGAT 
ATTGTGGTGA 
CTGCCATGGA 
TATCAACTCC 
GAATGCTTGA 
GCATCCTAGG 
TACATGTTAG 
CAAAGCCTAC 
TGAAAGTCAG 
GTTAGTTTAA 
AAATGTGTGT 
TTATTATAAT 
TGTACTTGAC 
GGCAGCTCAG 
TTGTTTTCTC 
ACCCATTCTT 
ACCACCGTGG 
TTGTGAATGG 
ATGTAGTTGA 
AAAGAAAATG 
TTAGTAAATG 
GCCCAGGTGA 
ATTCCTTATG 
GATATTAAGA 
AATGTCTTAC 
CCACAGTTGA 
TGTCTCCAAA 
ATCATGCAGT 
TAATACTAAC 
CTTTAATTGT 
CTACATAACT 
TTGCTAAAAT 
CTGCCCTTCA 
AAAGTAAACA 
GGCTCACACC 
AGAGTTCGAG 
TTAGCCGGGC 
AATCGCTTGA 
GCCTGGGCAA 
TCCCAGAATC 
ACCCCAGTGG 
TACCATCTCA 
GACTCCAGAG 
TGTCCCCACC 
CAGAGGAGCA 
ACAGCTGAGG 
AGAAGAAACA 
ATCTAAACTA 
TGAAACCTGT 
TTTCATTTTT 
CTGGAGTGCA 
TCTCCTGCCT 
ATTTTTTGTA 



CAGAACACAG 
TATATATATT 
AATCAGGGCC 
AAACTTTTAA 
AAATCCACCA 
GATACCAGAA 
CTTTCAAATA 
AGTGTAGAAG 
AGAGAGAATC 
AAATATTTAA 
AACTGAGTTT 
GAGTTTGTGA 
ACAACATAAG 
TCAATGATTA 
CAATGGCCTG 
AAATTTTCAG 
TATTCCTTTA 
TACATTTACC 
TTTTGCCAGA 
CCTTAAGAAT 
AGGACTTAAA 
TGTTAGTGAA 
AATAAAACCA 
AGGTAGAAGA 
GAGAAATAAT 
AACTTAGACG 
ATGAAAATAT 
ATGACAAAGC 
ATACAATACA 
AGGTAATAGA 
ATTCATTTTA 
TTTCTAACCA 
TTAATGCCTA 
AGACAGAATG 
CATGCCATCT 
TGTAATCTCA 
ACCAGCCTGG 
ATGGTGGTGC 
ACCTGGGAGG 
TAGAGTCTCA 
TAGCAATTCC 
CTCCATGGAA 
AAATGAGAGA 
GCACTGGCCT 
AATTACAGTC 
AGGGAGTCAC 
ATCAGTTGGT 
CAATGAGACA 
CTGAATGTAG 
TTCTATCACT 
TTATTTTTAT 
GTGGCGCAAA 
CAGCCTTCCG 
TTTTTATTAG 



CCTAATATCT 
CATGGGGTAC 
TTCAATCCAT 
TGTACAAATT 
TATCTATCTC 
TGCATGCATA 
AAATTGTCAT 
TTAATCGCTG 
ACAATTACAA 
ATGATAGTCA 
ATGAAAAATT 
AGAATGTTTT 
AATTTTGGCC 
TCAGAGTGGT 
TGATTCCAGC 
GGGCTTTTCT 
CCTTCAGGAA 
TATGGCCACC 
GGTGAATAAG 
TTATGAATAA 
ATTTCTATTA 
ATGTGTTACT 
ATATAAAACA 
GTAAGTGAAG 
TTGTCATATG 
ACAATGTCCC 
TCCGTAAGAC 
GATTAAGTAT 
ATGTACATTT 
CAAGTTGTTA 
TAGCTTTTCT 
CAAAAAAAGA 
AAGAAGAAAC 
CTTGTACCAC 
GGATATATAT 
GCACTTTGGG 
CCAACATGGT 
ACGCCTGTAA 
CAGAGGTTAC 
AAAAAAAAAA 
TGAATGTCCT 
GGACCTGGGC 
GCTTACTCCA 
AGGGAGGACA 
CTTGCGTAGG 
ATTCCAGGAC 
TGTTTTCTGC 
AGAGAAGAGC 
TGGTGTCTGA 
GACAAACTAT 
TTTTATTTTT 
CTCGGCTCAC 
AGTAGCTGGG 
AGATGGGGTT 



AGTAAGTAAT 
AAGTACAATT 
CCCGGAAAAA 
CATCCATCCA 
TTCTGCACCT 
TCAAGTAATA 
ACAAAATACT 
ATGCAAAAAA 
ATATCAGCCT 
TTGTTAAATC 
TGGGGATTTT 
ATAAAATACT 
TGTACCTCTC 
TTGTTTTCCT 
AATTCAAATA 
CTACAAGTGA 
AACCCTCAGC 
AGGTGTCACC 
AATTTAAAAT 
AGCCAGAAAA 
AAAAAATTAA 
GTGAAGACTG 
AATGCTTACC 
CCTTATAGCA 
CTTTCAGAAT 
TAGAGTGAAG 
AAAATGTAAA 
ATACACAAGA 
ATTAAAGTAT 
ATAGTTTTTC 
ACAATGAGCG 
AAATGGTTTA 
TTCTGAGCTG 
ATTTATGCAG 
ATTAAGACTC 
AGGCCGAGGC 
GAAACCCTGT 
TCCCAGCTAC 
AGTGAGCCGA 
AAAGACTCTT 
GGTTAGATTT 
ATCCTCTAAG 
CTTCATTGAG 
CCGTGTGTGA 
GTCCAAAGAA 
CTTCCTTCAG 
TGTTCCCCTT 
CATCTCCTTC 
GATGTATCAA 
GAGATACTCT 
TTGAGATGGA 
TGCAAGCTCT 
ACTACAGGCG 
TCACCATGTT 
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AGCCAGGATG 
TGGGATTACA 
AATGGGGATA 
GTAAGTAGGT 
TTTTCATTAT 
AATACTAGAG 
TTCATGCCAA 
TGCTAAAAAC 
CCTCTTCCAT 
GGAAGAATCC 
AAGGAAAAGT 
TATGACTAGC 
GTCCCCCAAA 
TCTCTGAGGT 
ACAGTTCCAG 
AATGGGGAAA 
CAATTTCTAT 
CTCTTTGAGT 
ACGTTCAGCT 
GAGGCATCAC 
CCCATCTCAA 
AATGCTTATT 
TCACCATATA 
ATCATAACCC 
CTCTTTTCCC 
TAAGCAAACT 
CTCCTACTTG 
ATGGAATCCA 
TCATTCTTTC 
TCTTCTTTAA 
AATAGCCACA 
AATAGTAGGT 
TATTTTCTGT 
TCCCATGGAT 
AAGAGACTTC 
AAAATTCCAA 
AGGGTGCCAG 
CCCTCAGCCC 
AGGCACTCCT 
CCCAGATCAC 
GCTCTTCCCC 
GCAATACGTC 
TCATGGTGAT 
AGGTCACATG 
GGAGAGGGGA 
TGGAGGAAGG 
TCCTAAGATT 
TCACATGTAA 
TTTATGTGCA 
GGCATTGTTA 
TATGAAAGAA 
TTTTGTCCTG 
TTTTTAAGAT 
CAGTGCAACT 



GTCTCGATCT 
GGCGTGAGCC 
ATAGTACCTA 
GCTCAGAAGA 
CAGAACAAGG 
TAGCATCCCA 
TTAGAAAAAA 
ACCCATCATA 
TCGTGCAGTG 
CCAAGGCTTG 
TGAACGGGTC 
CACGTCCCAG 
TTTAAGGAGT 
GACGGAGGAA 
GGGAGAGGTC 
TCTTTTTGAG 
GTTTAGGTTC 
CTCTAGTTTT 
AAGACGTAGT 
AAACCTAGGC 
TTTAGACCTG 
GGCTTTCTAA 
AGGGAGATCG 
AATATCCCAA 
TACACCACAG 
CAGGGTTAAG 
GCATGGTTGC 
GCTTCTCCTT 
CTTTGACACC 
CTACATTTAC 
GTGACTTCTC 
GCTCTGAAGA 
CTCCCAGGGA 
GCCAGATCCC 
CCCCTTGTTC 
TGAACAAGAT 
ACGGTGAGGG 
ACCCCCTAAC 
CTCAACCCCC 
AATGAGGGGC 
AGGGGGTACA 
TTTAGGTTCG 
GTTCTGGGGG 
ATGTGTCACC 
TCTGTTTACC 
AAATACCCTT 
GGACTCTAAC 
ATATACATAT 
TTGAAAATGA 
TAAGAAGCGG 
ACTTTTAACC 
GTATTCATAT 
GGAGTCTCAC 
TCCGCTTCCC 



CCTGACCTCG 
ACCGTGCCCG 
TCTCATAGAA 
GTCGGACACG 
AGAGACCAGG 
AATGAAGGCA 
CACCTCTTCA 
CTACCCACAG 
TACAGCCTTC 
GTGACAGATG 
CAGAAAATGC 
GGTTCAAAGC 
CCTCTTCCAA 
ATGAAGGAAG 
ACAGCTAGGG 
GAAATGAACA 
AACTCTCTCC 
GTCTCCTTCC 
GCCCCATGGC 
ACCATCTTGC 
GGCACTATTG 
CTGGTCTCCT 
TGGTCCTCCT 
AAGACCCTTG 
ATGTTCAGGG 
GTACAGTAAT 
TCCGTCTGTG 
CAAGATCCAG 
CTAAGCATTT 
TTGCTATCAA 
AACCTCAAAG 
TGTTTGTTGA 
GCTGCTGGTG 
CTCTGCCCCT 
CTACTCACTT 
GACGACAAAA 
CTCTAAAACA 
AAAGAGCAGA 
AATAGATTTT 
TGATCCAGGC 
GCCAAGGTTA 
AACTCCTTGG 
TAGTAGTTCA 
TTCACCAAAG 
CTTGCCAGGA 
TTCAGAAAAA 
ACAGTGTCAC 
CTGTTACCCA 
TTGAATACAG 
ACTTGTAAGA 
TAAAGGATTC 
GATAGCTGAG 
TCTGCTGCCT 
AGGTTCAAGC 



TGATCCACCC 
GCCTACTTCA 
TTATTGTAAG 
AAGTAAGTGC 
TAGAAAATTA 
CCATTAAACT 
CAACCCCTTT 
ATAGCCATGA 
ATAGCTGTGC 
AGTTACTGGG 
ATAGATACAT 
TTTTCTCAGA 
AAATAGGAAA 
CCTCTAGATG 
ATCACCGGCA 
GAGAAGGCTA 
TGAAACATGA 
CACAGTGAGT 
TCCTCCTGTG 
CTCTTCTCTC 
GATTTCAAGA 
CACCTCTCAT 
TTCTTAGGAT 
GACTCTGTAT 
GGTAGAAATG 
TATTTCTAAT 
TAGACCTCCC 
AAGGCTATCT 
GCTTCCTGCC 
TTTCATTCCC 
CCCCTGTACT 
ATTAGAGACT 
TCCCCAAAGA 
CTTCCCACTG 
GAACCCTGCC 
ACAGCAATTC 
GAAAAAGCAA 
TCCTCATCTC 
CTCAGCTCCT 
CTGGGTGCTC 
TCCAGCCCTG 
CATCCATTGG 
AGGCCCGACA 
GAGGCACTTG 
AGACTGGAAC 
AACAAGCTAC 
TTGGAGAGCA 
TGTTCTTTGT 
ATGGTCAGTT 
TAGGTAGCTT 
TTCTACTCTG 
ATCTCTGAAT 
AGGCTGGAGT 
GATGCTCCTG 



GCTTTGGCCT 
CTTTCTTCAT 
AAGTGCATGC 
TTTTATCATC 
TTGTGATTCT 
TTGCAAATCT 
CAAGATATTT 
TGCTTTTTCT 
AACTCACATC 
TAACACAGAG 
GTGTAAAAAT 
TGTTAAAATG 
TGAAATGACA 
CAGCTTGAGG 
TGCAGGAACT 
AAATCAAGGA 
AGAGCTCATA 
CTGCAGGCTG 
GAGACAAGAG 
TTCCTTATTT 
ACCATTATCT 
CTAACTTCTT 
CCTTCAATGA 
GAGCTGGCTT 
CATAATTGGT 
CTCCCAGTAT 
ATCATCTTCA 
TGATCCCCAG 
TGCTTTAGGA 
TACCAGATTT 
ACCTTAAACA 
TTCATTCTGG 
ATATAAATGA 
TGCCCTGGGG 
TCTTCCTTAA 
CACTGATGAC 
GTTAAAGCCT 
ACTGCCATAA 
GGCTCTCATC 
CACCTGGTAC 
GTAGGTCCCA 
CTGCTTATCC 
CCGTAGAGTG 
ACAGGAAAGA 
TTTCACTTCC 
AGGAGAGACA 
GTCAGATCAG 
TCTGATAGAT 
TCACCTGGGT 
CAGTGATTAT 
ATAAGTGGCC 
TCTCTTTTTT 
GCAGTGGCGC 
CCTCAGCCTT 



CCCAAAGTGC 
TTAAAAAAGA 
AGTAATGCAT 
CTTATCATAA 
TCAGGTCTGG 
GTATGACACC 
GCCTCCTACC 
GGGACAGGTG 
ACAATCAGAT 
AGAGGATTCA 
CTGGTAAGGT 
AATCATGTAA 
TAGGTGTATG 
TTCATGAGAG 
CAGAAACCTA 
GTTCGTCAGG 
AATGCACTCC 
CGTGTCACTC 
ACCCAGGAAA 
TCCTCATTCA 
CTCATCTGGA 
AACAACACAT 
CACCCCAGTG 
CTTTCTGATT 
GAGTGATAGC 
GCCTTATACT 
ACCTCACCTA 
CTGAATGTGA 
CCTCATGGGG 
GGGTTCTGAG 
GCTCTTGCAA 
GGAGAACCAT 
GAAAAATGCT 
CAGAGGTACT 
TATTATGAAC 
TCCAATGACT 
TTGATTGCCA 
TTACCTCCTC 
AGTCACATAC 
GTATATCTCT 
TCCCCATTGG 
TTCAGCCACT 
GTCACTGAAG 
GGAAGGATGA 
TTCTATAGGT 
CCATTTTGTG 
CTTGTTCTCC 
AAAATTGCCC 
CAACCTAGGA 
TGCTATGTTC 
TCACTTGATA 
TTTTTTTTTT 
GATCTTGGCT 
CCAATTAGCT 
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GGGACTACAG 
TCACCATGTT 
CCAAAGTGCT 
TAACAGGTAT 
TCCCTTTGAG 
ACATCTCAAT 
AGGCACACAG 
CTCCACTCTG 
AAAACACCTC 
TAGGCCCTGT 
GCCCTGGGTT 
CCATCATACC 
AGGATGACCT 
AGGAATAGGT 
TTCCCTCTTC 
AAAAGATGAA 
TGTGGTTGTG 
TCAGACTCTG 
TTCGGGGCTC 
AGCCCAAAGC 
AGTGCAGAGA 
GGAGCAGGAT 
CCTCATTTTG 
CTCTTTCCTT 
TCCTATTCCA 
TTAAGGTGTG 
CCAAATCCTG 
GAGACAGAGT 
CAACCTGCAC 
TATAGGCGTG 
CCATGTTGGC 
CAAAGTGCTG 
CTGAACTGCC 
TCCAAATGCA 
AAATGGCATT 
CAACTTTAAC 
AGGCTGAAGT 
AATACCCTCC 
GCTAATTTTT 
ACTCCTGAGC 
AGCCAATTCT 
TATTCAGGAG 
CTCTTGCCTG 
TTCATTCTGC 
GTTACTCCTT 
GCTCCTCTTC 
TTTTTTTCAA 
TTAGAGACAT 
GGACAGGATA 
CGCCCCCCCC 
ACTTGCCAGG 
TTCCTTCATG 
CTCACAACAC 
CTCTCCCACT 



GTGCGCATGA 
GGTCAGGCTG 
GGGATTACAG 
AAATATACAA 
CCATATGCAT 
TATAAGGTAG 
CCTCAGCCAC 
CCACTAGAGT 
TCCCCAGCTC 
TCTGCCTGGC 
CTGCTGCTCT 
CGTACTTCCA 
GCAGGGTGTG 
CCCTATTTCC 
CCTGCTCCCA 
AAGCTCTGAC 
ATTTTCCATA 
ACTCAGCTGC 
CACACGGCGA 
TTCAAACAAG 
GTGTGAACCT 
GTTGAGGCTC 
TGAAGGGTGA 
GTGTGCTTTT 
ATACTCATGA 
TCTAGCCATG 
AGGAATAATT 
CTCACTCTAT 
CTCCTGGGTT 
CGCCACCACA 
CAAGCTTGTC 
GGATTACAGA 
TATGTGGCCT 
GATCCTTGAT 
GCCAATTACC 
TCTTCTCTTT 
ACAGTGGCAC 
ACCTCAGCCT 
GTATTTTTTG 
TCAGGAATCT 
TCTCTTTCTC 
ACAATGGTTT 
GACTGTGTAA 
TTTCCACATA 
GGCTAGAATT 
TGAGCCCATT 
CCCGAGCTTC 
TTTTGGTTGT 
CTGCTAGACA 
CCCACACACA 
CCCACTCAGT 
TCTCTGCTCA 
CCATGATTTC 
AGAATGCAAA 



CTGTGACCAG 
GTCTCAAACT 
GGGTGAGCCA 
AAGATTATTG 
GGAGAAAAGA 
AGACTCTAGG 
CTCTGAAACT 
ATAGGGGCAG 
CAGCAACTGC 
CCGAATCTTG 
CCAATCCAGT 
GTAGCCCTCG 
GGACTCTGGA 
ACCATCCCCA 
CAAGACCTCA 
AACCTCAGGA 
ATAGTCCAGA 
AGCCACATCT 
CTCTCATGAT 
GAAAGACCAA 
GGAGACAGAG 
CACACACCTG 
GTTGCAGTCC 
CTCTGCCACA 
TTAGACAGAC 
GTAGTTGAAC 
CCTTCAGTTT 
CACCCAGGCT 
CAAGGGATTC 
CCAGGCTAAT 
TCAAACTCCT 
AGTGAGCCAC 
CACCACTTGG 
TTACCCCAAA 
CCACTGCTCA 
TTCAGGGGGT 
AGTCATGGCT 
CCCGAGTAGC 
TAGAGAAGGG 
GCTCTCCTTG 
TCACACAACA 
GTCACTCCCT 
CAGCTTCCTG 
GCAGCCAGAG 
CACACCACAG 
ACCTACTTCT 
TTAACCAGGA 
CAAGACTGGG 
TCCTACATGC 
CACACATGAG 
TTGCCTGGGA 
AGTGTCAGCC 
CTGATGTTGT 
ATATCAAAGG 



CTAATTTTTG 
CCTGACCTTG 
CCGTGCCCGG 
GTTAAATAAA 
AATTAAACCC 
ATTGAGAAAG 
CCAACCAGGG 
AAGTGTGTTT 
TGCAGCTGTG 
TGCCTTTCCC 
GTGTCAGGGC 
GTACTGTTGT 
AAAATCCCCA 
AGGACCAAAT 
GACTTCCAGC 
AGGTGAGGCC 
AGTCAACAGT 
GGCTTGAAAT 
CATAGAACAC 
GGTCCTGCTC 
CAACAGGCCT 
CATCAACTCA 
TGTCTTTCTT 
CGTGGCTGCC 
TCCACTAAAG 
TCAGGAGTTG 
TTTTTTTTTT 
GGAGTGCAGT 
TCCTACCTAA 
TTTTGTATTT 
GACCTCAAAT 
CGTGCCCAGC 
AAGCCTGACT 
CTGCTCTTTC 
GGCCAATAAA 
CAGGGGAGAC 
CACTGCAGCC 
TAGGATCACA 
GTTTTGCTGT 
GCCTCCTCCT 
TAGAATCCTT 
TTTCTGTTCC 
GCTGGGCTCC 
CAATCTTTTA 
CCTACAGGCG 
TGGCCTCTAC 
GTTTGTCTAC 
GGAGTGCTCC 
AGATGGTAGT 
TAGTGCTGAG 
AATACTGCTC 
CCAGAGTGAC 
ATATCTTTCT 
GTAAAGACTT 



TATTTTTTTA 
TGACCACCCG 
CCTTGACATT 
AAGCAAGGGC 
ATGACTTGTG 
TCCCTTCCCA 
ATTCCGTGCC 
CCACCATACC 
CAGGGCAGTC 
ACTCCAGCTT 
AGAATTCAAG 
CTTCTTGCAT 
GCCTTGTTAA 
GATCTCAGGA 
TGTTTCCTTC 
CCCTCTCCAC 
GAACATGTGA 
TCTACTGGAA 
GAACAGCTGG 
TGAGGCACCC 
TAACCATGTG 
TACCATCAGC 
CCATATGACA 
ACCCCCTCAC 
CTGGTGGATT 
GTGCTCAGGG 
TTTTTTTTTT 
GGCACAATCT 
GCCTCCTGAA 
TTAGTAGACA 
GATCTACCTG 
CTTGGTCCTG 
GGAATCTCAA 
CTCTGCCTTC 
ATTAAAATAA 
AGGGTCTTGC 
TCAACTTCCT 
GGTGCATGCC 
GTTGCCCAGG 
TGGCATGAGC 
CAGCAACTTC 
CACCCAGCCC 
CTGCTTTTAC 
AAAGCCTGTG 
CCTGCACAAC 
TCCCCAGCAC 
TAGGTGACAT 
TAGCACCTAG 
CCCCCTTCCC 
AAAACCCGCT 
CCAGTCAATA 
TTGCCCTGAC 
GCTCATTTGC 
GTTTCCCTGC 



GAGACGGGTT 
CCTCGGCCTC 
TCTGAATTTT 
CATAGACACT 
GCTGTCTCAT 
GAATTTGGAG 
CTGCAACCTC 
TTGTTGGTCC 
CCTCTCCAGG 
GGTGGGCCAG 
GTGGTCCTGC 
TTCACAGCCC 
CTGCAACCAA 
AGCAAATTCC 
AAGATGCATG 
ATACCCTTGC 
TCCCACCCTT 
ACCCATGGAG 
TCATCCACGT 
ATGAAGAGGT 
TAGTAGGAGG 
TGTGTCTGGT 
GTCCTGGGTG 
TGCCCCCAGA 
CTAGAAAATG 
CAAATTAGAC 
TTTTTTTTTT 
CAGCTCACTG 
AACCTGGGAC 
TGGGGTTTCA 
CCTCAGCCAC 
AATTCTTACA 
ACTTAACATG 
ACCATCTCAG 
AGAACAAAGT 
TCTGTCACCT 
GGGCTCAAGC 
ACCACACCCA 
CTGGTCTTGA 
TACTACACCC 
CTTCAGAATA 
ACTCCACTAC 
TGTTGCTCCC 
ACAGATCACT 
CTTGTTTGTG 
TACTTGTTTA 
GTGGCAAAGT 
TGAGTAGGGA 
ACCCCCACGC 
TTTTAATCCA 
TCATTCTTAT 
TTCTCTGCTT 
TTATTGTCAT 
TCTCTCCCTT 
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GGGGCTTGAA 
TCTGCTCAAC 
AGCTATGAAG 
GCCAGGAAGC 
GTACCAAAAC 
ACTACAGGGC 
TCAGAAAACC 
CAGTTAACTT 
ATCAAATAAG 
ATCAGTACTT 
GCCGAAAAGA 
CAAAAGAACA 
CACCCTAGAG 
ATCCCCAACG 
CCCGTGCGGA 
TCCTGTTTGG 
TCCAGGTTCA 
AAAAGCATTC 
GACAGCCGGG 
GGCACTCCCT 
AGCAGGTCCT 
CCACCCCCGC 
GCAGGACCGC 
TTTCCCCAGC 
ACGGGGATGG 
AAACTTCCTG 
GAAGGTACCT 
AAGGGGTGTT 
TTAAATATTT 
ACCAACTTAT 
AAATTAATTA 
CATAATTGAA 
ATGATTAAAC 
TTAGGTAGTT 
CTAAGACAGC 
GGATGGAGTT 
CAGGTGGAAA 
ACCCCAAGTT 
CCTGGGAGGC 
TATAGCTATA 
AACTCCTCCC 
CTTCGCAGAA 
TTTAAGCTTG 
TTGCTGCTTC 
TAATTCCCAG 
TCCCTCAACG 
ATACACCTGA 
TAAGCATGTA 
TGTAGGCAAA 
GGGCTGATAG 
AATTCATCTA 
GCACTTTGrGG 
CCAACATGGT 
GCAACTGTAA 



CAGTGCAACA 
ATGAAATTTT 
TGGAGACATG 
AGGCCGTGCC 
TCAAAGTGCT 
AAAATCACTG 
AGATTATTTA 
CTAGTAAAAT 
ATCAGATATG 
TGTAGAGAGG 
GAATTGTTCA 
ATTGCCAAGA 
ACCTCCACCC 
CCACTCTTTC 
TCCTGCTGTG 
AGATGACTGG 
AGGAGCCCCA 
TAGTTGGGGG 
GGAGGGGGCA 
CACGGGGTCT 
CCAAAGTTAG 
CGCGCCCCTA 
GGTCTGCAAA 
TCTGGCCGCA 
CTCCAGAAGT 
GTGAAAAGCA 
GCTTGTGAAA 
GTTGATTGGG 
ATGATTTTCA 
GTCTTATTTG 
AGTCATAACA 
TTATCTGACA 
ATATTGAGGC 
AGACATTAGC 
CACTGGCCCA 
TATGATAAAG 
TTCCCTAAGG 
CATCATGCCA 
TTTTCTTAAC 
AACTTCAATC 
CACAAACCCC 
ATAAGCCCGC 
CATTTTGGTG 
AGAGCTCCGG 
TAACAGTATG 
AAGTTTGGGA 
TTTGCTCCAA 
TGCAAATTTT 
CAGATACAAT 
TACTATTCCT 
TGTAAAATGC 
CGGCCGAGGA 
GAAACCCCGT 
TCCCAGCTAC 



CATGGCTGGG 
ATTATTCAAC 
AGCTCTGCCA 
ATGCCTCATT 
GTGCTGAGGC 
TCAACTAAGA 
TGTTCTTTGT 
AAACGTATTA 
AATGTAACTT 
CCTCTTAATT 
GTTCAAACGT 
GTGGGGAAAG 
CAGGTCTCAC 
GCGCCCCCAC 
GGTTTGCTCA 
GGAAAAAACT 
GGCTTAGCTC 
AAGGGAGTGG 
GGTCCTGGGG 
GGACGCAGAA 
CAAACTCCCA 
GTTCGCCCGC 
AGCATCAGGA 
CGTCCCCGTT 
CACCCTACAG 
ACAGGTCTTT 
CACTAGGTGA 
GAAAGTAGCT 
AAATTCAATC 
ACTTAGAAAT 
TTAACCAATT 
AGTGTTTCAC 
CTGCTCCTAA 
AGTTGGGAGG 
CCTAAATTCA 
TCTGTGGCCA 
TGGCACATGC 
TCATTATAAT 
AAATTATAGG 
AAATAACATC 
ATAAAAGCAC 
TGTCCCTCAG 
TTAGTTTGTA 
CTATAATAAT 
GGATGCCACC 
ATTATTGCCT 
ACCTTTACAT 
GGCAATTCAA 
AACATTGGAA 



TTTTTTCAAT 
AAAAATTGGC 
AGGCAGATCA 
CTTTACTAAA 
ATTAGAGGCT 



ACTCATTTAC 
CTCTAATGCA 
CCAAAGCCCC 
CTTGTCATGT 
CGGCGTGTGA 
TTAGAAGCAG 
AACCTGAAAA 
TTAGCTCCTA 
AGAAGTGAGT 
ACACAGCACA 
TCAAAACTAA 
GCCCGAGGTA 
CAAAAGTGGG 
CGCCCAACGC 
GCCTTCTCGG 
GCACAGCTGA 
AGCTCAAGTG 
GCGGTTCCAA 
CGAGGGACCC 
AGTAGGGAGA 
AGCGCAAAGA 
AGCCCTCGGA 
GGAGAAGCGC 
AAATCTCCGC 
CTATTGCCTA 
CAGAACTTTA 
TCCAGTGTCC 
TCGCAATGTT 
ATACATTTAA 
ATAAAGCTTT 
AGATCCTACT 
AAACTTTACA 
CCCCAGACAC 
GGATGACAGA 
GGCCCAAGAC 
AAATATCCTG 
CCAACAACAC 
AGAATTTACA 
TAAGACCATG 
ATCCTGTCAC 
CTTGAGCTCT 
AGTGTATTAT 
GTTCTTTGCT 
CTCCTCGGTT 
TGGGCAATGG 
TAGACATTTC 
ATCTAGCAAA 
GAAAATCAAA 
ACATGTAGAA 
TTTTGGTAAG 
CCAGCTCAGT 
CCTGAGATCA 
AATACAAAAA 
GAGGCAGGAG 



ACTTGTAAAC 
GTGTGATGTT 
GTGTACCATT 
GTAAAATGTG 
CCCACAGAAC 
CTGTAGTACT 
GAGTTATATA 
CCTCCCTATG 
GCATTGCTTA 
TTGCAAATCA 
CATATACTTA 
GGCCTCTCTC 
TGGAATGGTG 
ATTCGTTCTG 
CAAGCACTCA 
CATTGGAAAT 
AGGAACTACG 
AAGTCACTCC 
CTATCTGCAG 
GGGGCTTGCG 
AAAAGCTAGT 
CTCACGCAGC 
CGGCCTGGCT 
TTCTTTTGGG 
GGCTCAGGAG 
GTTCTCTCTC 
CCCTTGGTTT 
CTGATCTGAA 
AAATTTTATC 
TTCATTTTGT 
GAAACACGTT 
GTATTGGGAT 
AC TGATTTAA 
AGAGAGCGGA 
TACCCTAATG 
GAGAAAGAGA 
AAAAGCCTGT 
TACAGTTTTG 
CACAGTTTAA 
TCAGATACAG 
GTAAAGAAGT 
TGTGCTTCAA 
CACTATCACA 
AAAGGATCCA 
GATTTTAAAA 
AAACAATATT 
TTCAACAGGC 
CAGGATATCA 
TATTGATGAT 
ATATAATTAG 
GGCTCACGCT 
GGGGTTCGAG 
TTAGCCGGGC 
AATCGCTTGA 



AATGAATATT 
TAAGAATCAT 
GAATAAATTT 
GATACACGTA 
ACTGTGCTAC 
TGAAATAACA 
ATCTGAATTC 
CCTAGTGAAA 
CATGTTCATT 
ATAAAGCCTA 
ATTTTCCAGG 
AGGAGCCTCC 
AAGAATTCAG 
AGGTGGAAAC 
GGGAAGAACT 
AAACCCGAGT 
AGATTTATTT 
GCAGAGCCGG 
TTCAGTGGTA 
GATTGGGTTG 
TTCGATTTTT 
AAGCGCCCCT 
CGCGGGCCCA 
GGGCGGGGAA 
ATGCCCAGTA 
TCCTACAGCA 
TTAAATCCTG 
CTTTAGATAT 
TCAACCTTAG 
TTTTTGATTC 
CCACAGCCTT 
TATCTGGAGA 
TGGGTAATTG 
AAGGCTGTCA 
CCACCCTAAG 
AAGGAGGGTA 
CTTCAAGTTC 
CCCCCCCATC 
TTTTAGATTG 
CCCAAACCTC 
GCTGAGTTCA 
TAAACTTTGC 
AGAACTGAGA 
TCCCAATGCA 
GCTTTCCTTC 
AATAAATTTA 
ATTATTTTTG 
GGGCCTCGAC 
GGGCACATTG 
CATACCATAT 
TGTAATCCCA 
ACCAGCCTGG 
GTGATAGCAG 
ACCCGGGAGG 
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CGGAGGTTGC 
ACTTCATCTC 
ACTCGGGAAG 
GAGATCATGC 
AATAAATAAA 
CATTACCAAA 
ACCCTTAATT 
GACAATCTTT 
TATATTGCTT 
CATGGACCAA 
CAGCCGATGG 
CATTTATGTA 
AAGTGGGTTT 
TTTTCCAAAG 
CATCTTTGCC 
TGTTCCACCA 
CAAAAATCAA 
ATGCTATAAT 
GGTTGGAGTG 
TCCTGCCTCA 
TTTTGTATTT 
GACCTCAGGT 
CCACACCCAG 
TTTTTAGTAA 
CTATATT AG C 
GTCTATTCTA 
TTTTTTTTTA 
AGAAGAAGAA 
GCCATGGAGT 
ATTGTCCATT 
GAGACCCCGA 
TAGGAAGCTA 
CTTGAGCCAC 
AACACTCACA 
TAGAATTAGG 
CTTATAGGCA 
AAGGCCAGGC 
GATGGCTTGA 
CAAAACAATT 
CTCAGGAGGT 
GTGCTACTGC 
ATCCACATCC 
GAGTGACAAA 
ACATCCCCAC 
CTCCCCAAAA 
ACGGTGAGTT 
GACTGGGGTC 
GAGTTATGCT 
AATCCTTCCC 
CCTACGTAAC 
TGTTACGGCA 
TCTTTTAAGT 
GCCTTGAAAA 
TGCTGTACCT 



AGTGAGCTAA 
AAAAAAAAAA 
CTGAGACAGG 
CATTGCACTC 
ATAAAATGCA 
ATTTTACATT 
CATCGCCTCC 
AATCTACTTT 
TGCCGTGACT 
TAATATCTAT 
ACATTGGTTT 
CAAGTTTTTT 
GCTGGGTCAT 
TAAGCATTTT 
TGGGTTTTTG 
TCACTTGTTG 
TCTACCATAA 
CTATAATCCT 
CAGTGGCCCA 
GCCTCCCAAG 
TTAGTAGAGA 
GATCTGCCCA 
ACTATAATCC 
ATTGAATTCA 
TATTCTCAGT 
TTTTTGTATA 
AGAAAAATAG 
TAAAAACTTG 
GGTTATAGGT 
GTATTTGGCC 
GGGCAAATGG 
TAAATCCAAG 
CCTATCCTTG 
AAGGCTGGTA 
CCATATTTGT 
CATGCATGAA 
ATGGAGGCTC 
GCCCAGGAAT 
AAAAAATAAA 
TGGGAAGATC 
ACACCAGTCT 
CAGGAAAGTG 
TGTGTGTTGT 
ATACCACTTG 
GATACTCTGT 
CACTGGTTAA 
CTCACAGACA 
GCCACAAACC 
CAGAGGCTAC 
TGTGAGAGAA 
GCCCTAAGGA 
AGGTCTGTAC 
GTGAAAGGTG 
CACACCTGTA 



GATCGTGCCA 
AATTAGCTGG 
AGAATCGCTT 
CAGCCTGGGC 
AAAATTAATG 
TCTATCTCCC 
CAGATTCCTC 
CTTTCTATTT 
GGCTTCTTTC 
TATAAGGACA 
GTTTCTACTT 
TGTAGACTTA 
ATGGTAACAC 
ATCCTCCTAT 
AATCAGGGCC 
AGAAGACTCT 
ATGTGAGAGT 
ATCTTTTTTT 
ATCCCGGCCA 
CAGCTGGGAT 
CGGGGTTTCA 
CCTCAGCCTC 
TATCTTTATG 
AGAAGTTTCT 
CTGCTGAATT 
TGTTTTAATA 
TGAAAATCAG 
TCATATAAAC 
GCCAAAGGCT 
ATTAAGAGAC 
TCTGAAGGTG 
ATTAAAAAGT 
CTCCACCTTC 
AGCTGGAAAT 
TGGGGTTCAG 
GGGAACTGGT 
TTGCCTGTAA 
TCAAGACCAG 
ATTAGTCAGG 
ACTTAAGCCT 
AGGTGACAG A 
GTTGAAGATC 
GGAAAGAAAT 
TTAATCATCC 
CCTAACCCTC 
GAAGAGATTA 
CAGAGGGATG 
AAACACAGGA 
AGAGGGATCT 
TAAATTTCTT 
ACTTGATATA 
CCTTCCTCCC 
TTTGAACTGG 
ATCTCAGCAC 



TCGCACTCCA 
GTGTGGTGGC 
GAACCTGGGA 
AACAAGAGCG 
GATTTTAGTA 
CAAAAAGAAA 
CATTCTCCTC 
GGAACATTTA 
ATTTAGCATA 
TACCACAACA 
TATGGCTATT 
TGTTTTGATT 
TGTTTAACCT 
CAGCAGTGTA 
CCAGATAGAA 
TTTTTCATTG 
TTATTTCTGG 
TTTTTTGACA 
CTGGCTCCTC 
TACAGGTACC 
CCATGTTGGT 
CCAAAGTGCT 
TCAGGACTAC 
CAACTTCAAA 
TCCCTAGGAA 
TTTTCATAAG 
AATACTGGGG 
AAAAAAGAAA 
GCAGAGAAAT 
TTAGAAGACT 
AATAGATCAT 
TGACTGAACT 
TGCTGCAAGC 
GACAAAAATT 
ATTTTCATGT 
ATAGGGCTGT 
TCCCAGCACT 
CCTGGGAAAC 
TGTGGTGGCA 
GGGACATTGA 
' ATGAGACCCT 
TACTTTTCTC 
GGGGTGAGAG 
TTTTCCACCC 
AGTACCTGTG 
TAGTGGAATA 
ATGGCCAGGT 
AGCTGCTAGA 
TGGCCCTGAT 
TTGTTCTAAG 
CATTTCTTTT 
AGTGTCAACG 
TAATGAAAGA 
TTCGGGAGGA 



GCATGGGAGA 
ATGCACCTGT 
GGCGGAGGTT 
AAACTCCGTC 
TATTTACAGA 
CCATGTTCCC 
CTCCTCCCCT 
GTATACATAG 
ATGTTTTTAT 
TATTTTATTT 
GGGAATAGTG 
TCTTTTGGTT 
TTTGAGGAAT 
TGAGAGTTCT 
CAAAAATGTG 
AAGTGTTTTG 
AGTCTCAATT 
GAGCCTCACT 
CTCCCAGGTT 
TGCCACCATG 
CAGGCTGGTC 
GGGATTACAG 
ACTGTCTTGA 
TTTGATCTTT 
TTTTAGGATC 
AAACTTTTTT 
GTCAGGCGCA 
TGACCAATCA 
GGTGTCAGAT 
TAAGCCATAG 
TTCACCTTTA 
GTTAAAGAAG 
AAACAGAAAT 
ACTCCTGGGA 
ACACTTGGGA 
GTTCATAAGG 
TTGGGAGGCC 
ATAGGGAGAT 
CACACTTGTG 
GGCTGTAGTC 
GTCTCCAAAA 
TGTAAACCTA 
CTACGTAGAT 
ACTTATGGGA 
AACCTGACCT 
GGGTGAGTCC 
AGAGATGGAG 
AGTGGAAACA 
AATACCTTGA 
CCACCCAGTT 
ACTGTCATAG 
CATGGAATTC 
AATCTCAGCA 
TGAGGCGGGC 



CAAGAGCAAG 
AATTCCAGCT 
GTGGTGAGCC 
TCAAAAATAA 
GATGTGCAAC 
CTAATTCAGT 
CCCAGCCCTA 
AGGCATATAA 
GTATGTTTTT 
ATTCATTCAT 
CTGTTATAAA 
ATATATCTAG 
TGCCACATTC 
GATTTCTCTC 
GTTATTCAGT 
GCACCCTTAT 
TTATCCCATT 
CTATTGCCCA 
CAAGCAATTC 
CCTGGTTAAT 
TGGAACTCCT 
GCATGAGCCA 
TTACTATAGC 
TTTTGGAAGA 
TATTATCAAT 
CATTTAAACT 
TTTAACAGGC 
CATTGTGGAA 
ATACCTGAAA 
ATTGCTCAGT 
AGAGAGCAGG 
AAACTCTAAT 
GCTGAAATTC 
AAGTCAGATT 
AAGGGTTTAG 
TCAAGAGTTG 
GAGGCAGGAG 
GCTGTCTTCA 
GTCCCAGCCA 
AGCCATGATA 
AAAGAGCTGT 
ATAAAGAATA 
GCAAAACAAT 
TGAATTGCAT 
TATCTGGAAT 
TCCAACCAAT 
GCAGAGATTG 
GGCAAGAAAG 
TCTCAACTGG 
GATAGTACTT 
AAGTTTTGAA 
CTCTCCTTGT 
TGAGGCCAGA 
AGATCACTTG 
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AGGTCAGGAG 
AAAAATGTTA 
AGGAGAATTG 
CTCTAGCCTT 
TCAGCATTAT 
GCATCATAAA 
TACCAGATCT 
TGACTGCTTG 
AGTATAAATA 
ACCTATGAGG 
GACAAAACCC 
GCAAGACTCA 
GGCTTGCAAC 
ATGTGACTGG 
TTCCATACAA 
AACCAGACCC 
CTTTTACTTT 
GCCTCACTTA 
CTCACTACCT 
TGAAGCATTT 
TAGCAAACAA 
TTTAAATCTT 
CCACACCTAC 
TTGCCCTTAA 
TCCTTCAGTT 
CTGAGTTTCT 
TAAGACAGCT 
CAAGCCGTCT 
TTCATTATTT 
TTGCGGGAAG 
AAAGATGGTT 
TAAGCTGTAT 
TCTGGGTGGG 
TGGTTTGAAC 
CAGCTGAGTC 
CTAATGATTG 
TTATCAGGAA 
ATTACAGTTC 
CTCAGCTAAT 
CAGTTCATCA 
AAACCACCAT 
CCCCTTTTTT 
GACCACTGGT 
TTTTTAACCA 
TTAATTCATG 
CACATCCTAT 
CTTGTTTGGG 
ACCCCCACCA 
ATGGGTCATA 
CATTATACAA 
AGACTGTAGC 
CAGTCTGAGG 
GTCCCTTGAT 
ACTGGAGCAG 



TTCTAGACTA 
TCCTAGCCGG 
CTTGAACCCG 
GGTGAGAGAG 
AGAATAAAAA 
ATGGTCTTTG 
CAGCAATTGT 
TCTCATTTAT 
TTTGTGCATT 
TAATATAAAA 
CTCAGACACT 
CATCTCCAAA 
TCTAAGGGGG 
CAGCTGCATG 
TGTCTGGAAT 
AGGGTGCAAC 
TTCTTTCTTT 
TTCACCCCCT 
ATGTCTTCTT 
TGGTGAGCTA 
GGAAGCAGTA 
CTTAGCACTC 
ATGGGCACAT 
TCATCTATGT 
GCTAGCAAGT 
TGCCAGGCCA 
TGTAACCGTA 
TGTGCCCAAG 
TTCCAATTTT 
CATATACAGG 
TAATAGTGTC 
GCCCACATAT 
TCCACACAGT 
TCCACTAGGT 
TTCCCACAGG 
AGGCTTTTAG 
CTGGGTCTGT 
CTCCACATAC 
TGCAAAAACA 
TAAGAAGGTT 
ATTTACTCAA 
CCAGTGAGAA 
ACAGGAAGGG 
AGTTGCCTAA 
ACAAGCGTAC 
TTCTAACTTA 
CATTCCTTTT 
GTCCTCAGTC 
ACACACATCA 
ACAAGTTATT 
AACTTTTTGT 
AAGGTTAGTT 
GAGTTTTCTC 
GGCTTGTTGT 



CTCTGGCCAA 
GCATGGTGCC 
GGAGGTGGAG 
CAAGACTTGG 
TGTTTCCCCT 
CCAATGTTAT 
CACTATGTTC 
TTGTTTCTCG 
TTGTTGTTGT 
CTCATGTTTA 
GAGTTAAAGA 
AACCGAGCTC 
TCTGTGTGAG 
CACCAGTAAT 
CTATAGATAA 
ACCAGGCTGT 
GGAGGCAAAA 
TTGAGAATCT 
GAAAGACAGA 
AGGTAGTGAT 
AGCAGGTTTC 
GGAACCATTT 
GTGCCACTTT 
GTAGACAGCA 
AGTCGAGAGC 
CAGTAGTCAG 
TGATTCAGTT 
TAGCAGGCCC 
CTATAGCTAT 
GAAGCCCAGG 
AATAACACAA 
CCAGTATAAT 
TTGCAACTTT 
GGCTGTTTTT 
AAGGGTGAAG 
GACCCAGAAG 
AGGTACTAAT 
ATACATAACA 
AATTTCTTGT 
TGAAATACTG 
GGATCCAGTC 
TCAAGGGGGT 
CCACTTTTCC 
ATGACACAAG 
TTATTTTCTG 
TTACTATTAA 
TCTTCTGTTT 
CTCAATCTTA 
GGTTGGTCAT 
TTTAGAGTCT 
CCTACCTCAG 
GAAGTCTTTA 
ATGTTTCGGC 
CTTCTTCAGT 



CATGGTGAAA 
TGTAGTCCCA 
GTTGCAGTGA 
TCTTAAAAAA 
TCCCCCCAAA 
TTTTATTATA 
TGTAAAAATC 
TGTCATACTG 
TAAAACAGCT 
ACACTTATTT 
AGGAAGGGCT 
CCTGAGTGAG 
AGGGTCATGA 
CAGAACAGAA 
CATAACCGGT 
CTGCCTGTGG 
ATTGGGCATA 
CACTCATTAG 
TTGATAATGA 
GAAGCTTTTT 
TATTAATATT 
TTCAAACATG 
TGTCATATTT 
ATTAGTAAGG 
CAATCCATTT 
GGCTCTGCTG 
GAGCATGTAA 
ATAATATTGT 
GCTTTTTTTT 
AGTTTGCCTG 
CTACCTGCCC 
CCAGTGGGGG 
GGGAATTTAC 
ATAGTACTAT 
TCCTTCCCCA 
TTATCAGGGT 
TCTCGTGCTT 
TGAAGTGACA 
TTTTCCTGGA 
GCTCAGGGGA 
CAGCCCCAAC 
TGGTTATTAC 
CTTTCTGAAG 
ACCAGTATCT 
CCATATAGCC 
TGACAGCACA 
TGGCTAACAC 
TTTCAAAAAC 
TTCTTGGGCT 
TTGTACACTT 
TGACTTGATG 
CTGTGCAAGT 
CATGCATGGA 
CACTTTGCAG 



CCCCATCTCT 
GCTACTCAGG 
ACTGAGATCA 
GAGAAAAGAA 
CTTTAAAAAA 
ACAAAGGAAT 
ACTTCCTAAA 
CAATGGATAT 
TTTTTGGCCT 
TTGTAGGAGG 
TTATTCAGCT 
CAATTCCTGT 
TCGACTGAGC 
CAGGGATTTT 
TAGGTCGGGG 
ATTTCATTTC 
AGACAATATG 
TGGGAGTTCT 
TTCATATAGT 
ATCATTTGGA 
ATAACTCCTA 
GCCCCAGAAA 
CTAACTATGT 
TTAAATTTCC 
TGATAGATAG 
GTCTTATTAG 
ATGGGGGTCC 
ATGATTCTCT 
TTTTTTTTTT 
TCTTTATGGG 
ACTGGTCAGG 
CTGTCCAGTC 
TAAATAGATT 
TATACAGTTT 
CTTTTGCTAT 
GAGTCTTTTG 
CCCATGGCCA 
TTGAGAGACT 
ATTTCTAGTA 
G CATTT AT AA 
TATTTCTAAG 
TAGTTCTAAG 
GTGGACAGGA 
ACATTTATTT 
TCTTTCCTAA 
GGCATCAAAT 
TTTACTCGTA 
TGTGGTCGTG 
ACCTGCCTTG 
ATAATAACCA 
TATACACTGG 
CCAAATTTTA 
CCAGTCAGCT 
GCGTTGGCGA 



ACTAAAAACA 
AGGCTGAGGC 
CGCCACTGCA 
AAATGAAATT 
GCAGAAGTCT 
CTTGCAAGGC 
ATGTCTGAAT 
CTGTCTTGTT 
GTCTTCTTCC 
ACAAGCTACA 
GGGAGCTTTG 
CCCTTTTAAG 
AAGTGGGGGT 
CACAGTGTTT 
GTCAATCTTT 
TGCCTTTTAG 
AGGGGTGGTC 
CACTTTTATT 
ACACTTGTGC 
GAAGTACAGG 
TTATAAGAGT 
CAAATCCATA 
CTTCAACTAC 
TACAGACCCC 
CATTTTGCAT 
TAATTATTTC 
CATATCCCCA 
CAGGGGGCCA 
TTTTTTTTTT 
CAGTAGGAAG 
TAATTTGGCA 
CCGGTGGGAC 
TTTCTTAGTG 
TTGCCCAAGG 
ACAGTATTGT 
AGCTGGGAAT 
TTGATCTCCC 
GGGCTACATG 
CTGGCACATT 
ACTTCTCCTC 
GTTACACGAT 
GGGTTACACT 
TTCTTTTTAT 
CCACGCAGTC 
TGAACAGAAC 
TTCAAGGTGA 
TCGTTTATGA 
GGAGGCTCAG 
TATAGAATAG 
TAAAATAATA 
GAACAGCCCT 
AGGAAAATGA 
TCCGGGTGTG 
AGCTGCCACG 
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TACAGCTCAC 
ACTGAGTCCA 
GTGGCAGGTT 
CTTTGGTACT 
AGCATGGAGG 
TAGCAGGGCT 
TAGTTGTTTG 
CCATTTTTTC 
GGGCTGGGGC 
AATAGATGTA 
CAAATCCTGC 
CAATTGCATC 
GATGTCTTAT 
GGACTAAAGT 
AATACCACCA 
AATTCTTAAG 

ATGAGAGACA 
GACCTGCAGG 
CAGGCTGTAG 
CACCTTAGTG 
TTGGTTTTGT 
TGGTATCCTG 
AAAATGAATG 
TTTAGTGGTC 
GACTCCTCGT 
ACTAAGGCTC 
TGTCTTCGAC 
TGAGGCAAAA 
AAGAAAGAGT 
TTTTCCCCTG 
TGATGAGTCC 
TAGGGTCCTT 
TCTTCAGGCT 
TTAGTTTTGA 
CCTTTTGTTT 
CTGAGAAATT 
CATTTTATAT 
ATATTGGTGT 
TGATTTTTAA 
ATTTTTGTTA 
TTCAGTTCAC 
AATTTTCTTT 
CTTTTAATGT 
ATATTTTACT 
GAAGGAGGCC 
CACACATTTT 
ATTGCAAACA 
TACAACATTC 
GATGATAACC 
GGCCACAAGA 
CTTTTGTTGA 
TATTTAGTCC 
TGGGAGOAAC 
TTGTATGGTA 



AGTCTACTGA 
ATACCTCTAC 
TTAGGGTGTT 
TGGTTAATCT 
GCCTTTAAGT 
GTTGCTGCCA 
GAGGCCCAGC 
TCTTTCTGAC 
CGACATGAGT 
GTTTATCCAA 
AGCTATTTGA 
TAAATAGATG 
TTTTCCTCCC 
ACAAGTGCCG 
CACATCCGCT 
CTCACTGCAT 
AGAAGTGAAC 
GTGCTGGACT 
CATCCTGGAA 
GAAAGGGGAT 
TTAATGTGTG 
CTTAATTATC 
TATGATTTTA 
CACACAACAT 
TGACACTGGG 
AGTCTGAGGA 
TTGGCAAGTC 
TTGACTTGGT 
AATAGAATAG 
GGACTATGGC 
ATCCCTTTTT 
CCTAGGCTGG 
GGTGCTGGTT 
GGGAAAGGAA 
TAAATGTGGG 
TCCTTTAGCA 
TTGACAACGC 
GTTATTAATG 
TGTCTGACCA 
AAGAACAGGT 
AGAAAAACTG 
ACAATTAAGG 
AGGTAAAAAT 
TTCCTTACAT 
TAATTACTTT 
TTTCATGACT 
TCCCTTTCTT 
TTTTTTATAT 
ATTCTTTTCC 
TTAGAAGTTA 
AAACCTTGTA 
AAATTAACTT 
CATCTATCCT 
ATTAAGATTT 



TGTTCAAGGA 
TCAGTCACTT 
GCAAATTTCA 
AGCATTTGTT 
TTAGGTTTTG 
AGGCTCTTAA 
CTCGGCCAGG 
ACATAGAGTG 
TTTTCTTTTA 
TCTACATTTT 
TTTTGGGATT 
TGAGAGTTGA 
TCTGGTTGAT 
CTCCAGTTAT 
TGGGGATGAA 
CCCTTCAGGT 
TTAGTTTTGG 
TTGGGATATA 
AACAGTTACC 
AATCTGGCCC 
GACAGAATAT 
AAAGTTTGTT 
GGAAATTACA 
TCGACCAACT 
GTCTTTATTG 
GAGTCAGGAG 
CCCACAGGGT 
TATGTTAATA 
ATGAAGGAGT 
CCATGACTCT 
CACCGTATGA 
CTCAAGTTTT 
TACAGAAAAT 
AGTGGAAGAT 
GACATCAGCA 
CCTATTTTTA 
TTCTTGTATG 
TTAAACTTAG 
TAAGGTAAGA 
TAGTGCTTTA 
TATGATACCC 
TTTCAAAACT 
CCACATTCTT 
ACCTTGCACA 
TAAATTATAC 
TTCACAGACA 
TAAACAACTA 
AAATTCTGCC 
AAAGCGAACT 
CTATAATACA 
AGTTTGGGAT 
AGAATTGGTA 
CCTGTCCTGA 
AGATCCCCTG 



TGGTCTTGGA 
TCAACTGGGC 
ATGGTTATGC 
AGCCAATGAT 
TCCAAGAGTT 
GCATGGAGGC 
GCCCCACAGT 
TAAAGGGTTT 
ACTCATGAAA 
TATTAACTGT 
TAAATTGATC 
AAGACACATA 
GAAATGCTAG 
TTGGCAGAGT 
CAAAGGCTGA 
CTCCAAGGAA 
GAGATGGAAG 
GCAGAGAGAG 
ATGCAGCCCA 
TCTGGCCTGC 
TTGATCCATT 
TTAAGTCTTT 
AAAACCGGTT 
ATGGCATAAA 
AAATCTCTCT 
GGACAGAGGT 
ATAACAAGGC 
ACTAGATGGT 
TAAATTTTTC 
GGAGGGGGTG 
ACAACAGTCT 
CCTTCTTTCC 
TCTAGGGGTG 
AAACCAAGTA 
GTGGACTTTA 
TTAGTTTTTA 
TTTATACCAG 
TTTTAATAAA 
TTTTTATAGA 
AGAAAAACCC 
CTTAACTTTA 
TGCTTAAACC 
ATGCATCCTC 
TAAACTGTTT 
AACATTTCTT 
ATTCTTCGAC 
GTTAATTTAT 
TCCTCTTTAT 
TCTTTTATGT 
TGTTACACTG 
TTCAATTATC 
TAGATGGCTT 
AGGGAGTTCC 
TTAGGAAACC 



AGTTGGGCCC 
TTTCTGATAC 
AGGGATTTTC 
GTATTTATTA 
AGCTTATCTG 
CAACCCTTAG 
CTGGGTCAAA 
TGTCAGGTCA 
AACTCATTGC 
CACCCACCAA 
TGCTATTCCC 
AGGGTCTTCT 
GGTGAAAGGG 
GCCCAGTAAA 
CTGATTGAGA 
TGCTAAGTTT 
CTGGATGGCC 
CTTGGCACGA 
TGCCTGGTCA 
CATGTGCACA 
CCAACTGGGC 
AACTTCTATG 
GGGGCAGTCC 
AGCTCTACAT 
GGATTAAATG 
ACTTTTCTGA 
AAGCATTAAA 
CAGAAATAGA 
TTAGCTTTAG 
GCACTTTCTT 
CGGTGGTTAG 
ACCCTTTGAT 
GTACATGTGC 
TATAACTTTT 
TAGTCCTTGG 
GACCAAAGAA 
ATAAGCTAGA 
ACTCTGTAGA 
CTTTTCTTTA 
GTTGTGTTTT 
GCCAATATGT 
TTCAAAACAA 
ATAATCCTTT 
ATTCAATAGT 
GCATAAATTT 
ATGCCTCAAC 
CTCAGGACAA 
TTCCTTTTTT 
CTGTGGACTA 
TTAACTTTTA 
CTTTGCTATT 

T Ti ' i-rrrm ' 



TCCTAGGTCT 
TGCCGGGTTA 



ACTAGAATTA 
CAGGAGCAAG 
ACATAG CAAA 
AAGTCACCAC 
CCTCTTGTGC 
AAACTCCATC 
ACTCCAACCG 
GGTAGCCCCA 
TGTTGGTTGT 
AATATTGACT 
TGTGGGACTC 
CTTGCTTTAC 
ATAGCCAACT 
GGTCCACCAC 
AGCTCCTGAA 
CCTCCCTGTC 
CTCAGGGGTT 
CTTATTACTC 
ACAGGAGGAC 
AGCATAACAA 
ATTTGCATCT 
ACCCTCTAGT 
ATCCTCGCTC 
CAGGGGGCAA 
GTCTCAGTTT 
AGTACAGAGA 
TTCAATAGTT 
GTGAGGGAAG 
TTTGGTAGGG 
GACTCGGGTG 
CAGCACAAGG 
GAGAACATGA 
TAAAAGACTT 
AAGAAGTTGA 
TGCCTTCTTA 
AGTCAAATGC 
TTTCACCTTT 
CATATTTATT 
ACCTTTTATA 
TATTTTAATG 
TTAGACACAG ' 
TTTTTGTAAC 
TACCAAAGGT 
TTTACATTTA 
ATTTTTCTAA 
TTTCTGACTT 
GGATTTTCCA 
TTTTTCCGAG 
GACTGTCTAA 
GCAAACTTTA 
AATAAGACCT 
TTTAATTACC 
GGTCAGAGCT 
AGAGAATTTT 
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CAGTGGTTAA 
TGAGGTGTGC 
AAAGCAGTTG 
ATTTTTGATA 
CTTCCCGGTG 
TGAGGCGTTC 
AACTGGCTGG 
TGCCTTGACC 
CCTGTCCCTT 
TGAGCAAGCA 
CAGCACAGGG 
CCTGTTAGGT 
TGTGGATTTC 
GCATAAGACA 
ACCCCAAGTA 
AAAGGCGTAA 
TGTGAAAAAT 
ACACCCTCTC 
GAGAGCTGTG 
ATATACCCAA 
TAAACTTTCT 
AGTGGAGGAA 
GAAATGTGAA 
TAAATTTGTT 
TAAGACACTG 
AATTAGAATG 
GCCGGGCGCG 
CAGGAGTTTG 
AAATTAGCCG 
GAGAATCACT 
CCAGCCTGGG 
TATATATATA 
GGAAAATCCA 
AAAGAGAGCA 
ACTCAGGAAT 
TATCATCTCA 
ACTGAGAGAG 
TAGATGTCCA 
ACCTTTGTGT 
ACCTACTGCA 
GTAAGAGATT 
TACACCCCGC 
CGTTGGCCAG 
CAGATCATCT 
TACTAAAAAT 
GGGAGCCTGA 
TCTTGCCACT 
AAACCCACCA 
AAGCAGTGTT 
CAGTGGCTCA 
AGCTCAGGAG 
ACAAAAATGA 
GTGGGAGGAT 
CACCCCAGCC 



TGTTAAATCA 
TCACAATGAG 
CCGCTACAGA 
GGAAGGCCTT 
TTCAGCCACT 
CACCGGGGCA 
AGTTCCCCGT 
GTCCGTTAAT 
TTAAGGGCTT 
GGGGGTACGT 
ATTTTCACAA 
CAAAGGTCGA 
ATTTCTCCCT 
ATATGAGGGG 
AATTGGCAAA 
AGATATTTCT 
AACCACTAGA 
AGTCCCCACC 
GTCTCCCCTC 
TATCTCTCCC 
CTATATATCC 
AATGGGGAAG 
AAACAAAAAC 
TGTGTCAAAA 
CTGTAAGGAT 
CACAAGGCCA 
GTGGCTCATG 
AGACCAGGCT 
GGCGTGGTGG 
TAAACTCAGG 
TGACAGTGTG 
TATATATATA 
AAGCACTTGG 
TTAACAAATT 
CCTCATACAC 
AACATATCCC 
GTGGAGGTAA 
CATAGTTACG 
ATATTGTTCC 
CACAGTAAAT 
GAAGATTGTT 
CACCCCGCTT 
GGGTTGTGGC 
GAGGTCAGAA 
ATAAAAAATT 
GGCAGGAGAC 
GCACTCCAGC 
AAACTTTAAA 
CAGGAAAGTC 
GGCCCTGTAA 
ATCGAGACCA 
GCTGGGAGTG 
CTCTTGAACC 
TGGATGATAG 



TCTTCTTTTT 
GTTTCCTGTA 
TTGAATGCAT 
AATGCTTTTG 
GCGTTGATCC 
ATTGCCTACC 
AGGGATGCTC 
CACCTCTGTC 
ACAACTCTAA 
GACTGGGGCT 
TGCTTTTCCA 



TCTTTAACCA 



TTTAATTTTT 
TGGTCTCCTC 
TATTAATAAA 
GTGGGGAAAA 
GACCCTAAAG 
GTTACCTCCC 
CCCATATGTC 
ATATATACAT 
ACATATACCT 
AGAGAAGAAG 
CACACACAGA 
TTAAGAATTC 
GGTAGAGAAT 
AGAAGAACAA 
CCAGTAATCC 
GGCCAACATT 
TGGGTGCCTA 
AGGCAGAGGT 
AGACTCTGTC 
TATATATATA 
TAATGAAAGA 
AGAGAGCTGA 
TGCTGATGGG 
ATAAAGGTGA 
AATGAAGTCA 
TGGAAGAATC 
TGGCAGGTAG 
GGCCAGGCTG 
CCCTGGTCTG 
CCCATCTTTC 
TCACACCTGT 
GTTCCAGACC 
AGCAGGGCAT 
TCACTTGAAG 
CTGGACAACA 
TCTACCTATG 
AGATGAATAC 
TCCCAATCCT 
GTCTGGACAA 
GTGGCGCACA 
CAGAAGGCGG 
AGCCAGACCC 



TCTTTTTTCC 
AAAGTTATTT 
TTGGGCCATC 
GAATATGCCC 
TCCACGAGGG 
TGGGAGCGCT 
CACAGGGCAG 
TCCAAAAACC 
GGGGGTCTGC 
GCATGCATCA 
TACAATGTCT 
GACCCAGGGT 
ACTTTTTCTT 
CCTTAATTTA 
GTTATGGCAT 
CATTTGTTCA 
TACCCAGGGG 
CAGAAGGGAA 
CACATATACC 
ATTTATCTGA 
AACCCTCTCA 
TTATCAAAGG 
AAAAAAAAAC 
CGGTTCAATG 
TAAATGTCTG 
AACAGAAACT 
CAGCGCTTTG 
GTGAAACCCC 
TAATCCCAGC 
TGCAGTGAGC 
TCAAAAAAAA 
TGAAATAAAT 
AAGGTAAAGT 
ATAATGCTCA 
AGTGCCCACT 
CAGGAAAGTG 
CTGCACAATA 
CGTAAGATAC 
GCATGGAGGT 
AGCACTGACT 
GGACCCTGCA 
CTACCTGATT 
AATCCCAGCA 
AGCCTGGCCA 
GGTGGCACAC 
CACAGTGATG 
GAGTGACACT 
GCCAAATGCC 
CCTAAAATTA 
TCTTGGGAGG 
CATGGTGAGA 
CCTGTAGTCC 
AGACTGCAGT 
CCATCTCCAG 



TTAGGATACT 
TTTTACTTTC 
CGCGGGTTAC 
TGACAACAAA 
CCTGCCACGT 
CTCCAGATCT 
GCCTAAGTCG 
AGCTCCCTGA 
ATGAGAGGGT 
GTAATCAGAA 
GGAATCTATA 
GCGGTGCCGG 
TCTTTGGAGG 
AACAAAATTT 
AGAAAATAAA 
TTAGTTATCA 
CTAATAATAA 
GAGGAAGAGG 
TGACCTCCCC 
CCTCTCCACA 
CACACATATA 
ATAAATCTAG 
ACACACAAAA 
AAGGATCCCA 
AATCAGACGA 
CCACATAAAA 
GGAGGCCAGG 
ATCTCTACAA 
TACTTGGGAG 
TGAGATCACA 
AAAAAAATTA 
GAACAAGAAA 
GATGTGTCCT 
GTATTGGTGT 
CCCTGGGAAT 
TGGGCTGACT 
TAGAGTTGGA 
ACACACACAC 
TTAGAGGCTT 
TCCATGAAGG 
ACTGAATATG 
AGAATAGCTT 
CTTTGGGAGG 
ACATGGCGAA 
ACCTGTCATC 
GAGGTTGAAG 
TTGTCTCAAC 
TGCTAAAATG 
GATGCAATGT 
CCGAGGCGAC 
CCGTGTCTCT 
CAGCTACTCA 
GAGCAGAGAT 
AAAAAAAAAT 



TCTGAACCGG 
TTCTGTTAGC 
TGGGTTAAGG 
GTGCCAGTTC 
GCTGCTCTGG 
GTGTCGCTCA 
CCTAAGGGGC 
GTGAGCAATT 
CGTGATTGAT 
CAGAACAGAA 
GATAACATAA 
GCTGTTTGCC 
CAGAAATTGG 
TCAAAGTCCT 
AATGATTGTA 
GTTAAAATTC 
GAAGGGAGGA 
GTGACTCCAG 
TCCCCAAAAT 
TATGTATACC 
GCTGACCTCC 
GTCATACTCA 
AAGAAATTGA 
TGGATAAAGT 
AAGGATGAGT 
AATGTATGAG 
GCGGGCCGAT 
AAAATACAAA 
GCTGAGGCAG 
CCATTGCACT 
TATATATATA 
TTTAGATACA 
TTTGCATTTA 
GGATATGGAG 
ATT TI'CC AAA 
GATATCCTTC 
AGCAATGGAT 
ACACACACAC 
TCTACATCAC 
GAGATTGAAG 
CAGAAAAAAG 
TTTCAGAAAA 
CTGAGGCGGG 
ACCCCATCTC 
CCAGCTACTC 
TTAGCTGAGA 
AACAACAACA 
AGCACCCAAG 
TGGCTGGTCA 
AGATCGCTTA 
ACAAAAACGT 
GGAAGCTGAG 
CATGCCACTA 
AAAGAGAGAG 
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AGAGATGCAA 

AGCATGTTAG 

GTACCAACTT 

CTCCATGCTC 

AGCACTGACA 

CCTGGTCCCT 
CCATTTCCCA 

AGAGGTGGAA 
CCGAGATGTT 
AGGTAGTGCT 
ATGCATTCCC 
GTTCCCTTGT 
GCTGCTCAAA 
CCCCACCACA 
TACAGGGTGC 
ACCGTCTACC 
TGTACCGGAA 
TGGAGGTAGC 
TTGGAGTCAG 
CATTTTGCCA 
AGCTATTCGT 
AAAACAAAAA 
GCCGGGGACC 
CCTTCCCAGC 
AGGGTCTTTG 
CCTTATAAAA 
GAGTAGACAC 
TGGGAGAGAA 
ATACTTTGAT 
GCCAACAAGT 
TGGACTGAAT 
GATGGTACTT 
GTGTCTCTCA 
CTCATGCCTG 
AGTTTGAGAC 
GGCCAGGTGT 
TCCCTTGAAC 
TGATAGAGAC 
CTGATGCCTG 
AGTTCAAGAC 
AACTGGGTGT 
TTGCTTGAGC 
AACCTGTCTC 
CTGCCTAAGC 
GCCACCAACA 
CTTGGACTTC 
ATAGTATTTT 
ATCACAAGTC 
ATGATGAATT 
GCTTTACAAC 
ACTAGCTACT 
AGAATGTTTA 
CTGGATTGAA 
CTGGTGGCCA 



TATTTAGGGT 
ATTCTGGGTC 
TGGAAGCCTG 
AGCTTGGCAA 
GGTTCCATTC 
CAGTAATCTC 
CCAAGAGGTC 
TGAAGAATGA 
AGCTAACTCA 
TCAGCCTCAG 
ACAAAAATAA 
CCCGCTCGCA 
CCTCTAGGGG 
GTGTCTAGGG 
TCTTTTAGTT 
TTGTCCCAAG 
TAAATCAGAC 
TCTCAGCAGT 
CCACTCAGTG 
GGCAAACGTT 
GTCTTGTGGC 
TGCCTGTCCT 
ACGCCCTTCC 
ACTTTCCCCC 
CACTTCATCA 
AGGAGAAATG 
AGGGAGAATC 
ACCTGGAACA 
TTCAGACTTC 
TTGAGGTACT 
TGTGACTCCC 
GGAGCTGGGG 
TGATGAAATT 
TAATCCCAGC 
CAGCCTGGCC 
GGTGGTGCAC 
CCAGGAGGTG 
TCCATCTCAA 
TAATTCCAAC 
CAGCTTGGAC 
GGTGGTACAC 
CCAGGAGGAG 
GGGAAAAGGA 
CCTACAAGCA 
AGTCAGGAAG 
TGAGCTTCCA 
ATTACAGCAG 
CACGCCTCCA 
ATTTTTAAGA 
GTGATAATAG 
GGCCACTTGT 
ATTTTACTTA 
CAGCACAGCT 
AGCAGCAATG 



TCAACAAGAC 
CTTCATCCTA 
GATCTTCATC 
GAGTATCTGT 
CCACTAGGGT 
AGCATGGTAG 
TGATGGCTCA 
ATCCTGGGCT 
TGAGAGCCAG 
CAGTCCACAT 
AGTTGTTGAA 
TGGCATGTGA 
AACAGTAAGA 
TTGAATGTTT 
TTGCCATTTA 
GACAGAAGAA 
CACACCTGGG 
TGGGCAAAGC 
GCCCAGGCTC 
TGTTGTGTGC 
AGGCCAGGGG 
CACCGTGGTC 
CTTCCCCACT 
TCCTGTATCA 
GTTAAGATAA 
TCATACACAG 
ACCATTCAAG 
GATTATCCCT 
CAGCTTCCAG 
TTGTTACTGC 
CGTCGCAAAA 
CGTTTGGGAA 
CATGCCCTTA 
ACTTTGGGAG 
AACATGGTAA 
GCTTGTACTC 
GAAGTTGCAG 
AAAAAAAAAA 
ACTATGAGAG 
AAAATAGTGA 
ATCTGAGGCT 
GCTGCAGTGA 
GAAAACAGTG 
CAAAAAGGAC 
AGAGCGTTCA 
GAACTGTGAG 
CTCAAGGTAA 
GAAAAAGACT 
ACTTTTAAGG 
AATGCTCTGT 
GACTATTGTG 
ATTTTAATTC 
CGAGTCTTTT 
GCAGGTAGTA 



TGAACTTCTG 
ACCCCCTGTT 
CCCTCATGAT 
CTTCTCCTCA 
GGCACCCTAT 
CACAATCGAA 
TCACATAGAC 
CTGCTCTTCC 
AAACCAACTG 
TCTAGGAACC 
GTCCTAACCA 
TAGGAGTGTG 
CGGGCAGGTT 
ACAGCTCCTG 
TAGGCAGCTG 
GGCTTTCTGT 
CTTAGAGAAA 
CAAAAGTGGA 
TCCTGCAACC 
TCTTCTGCCA 
AGGTCTTGGG 
CCTGGGCACA 
TCCATATCAT 
GGACCTGTGA 
GAGTGGGCTC 
AGACTGACAC 
TCAAGCAATG 
CATTGCCTTC 
GACTGTGTGA 
AGCCCCAGAA 
TTCATATGTT 
GTCATTATAT 
TTAAAAGAGA 
GCTGAGGTGG 
AACCCCATGT 
CCAGCTACTT 
TGAGATCACA 
AAAAAAAGAC 
GCTGAAGCAG 
GACCCCCAAC 
CCAGCTACTC 
GCCATTGCTG 
AGACCTCTTT 
ACCACATGAG 
CCTAGAAACT 
AAAGTTATTT 
CTAACATAGT 
TCCCTAAAAA 
GATCTGACAA 
GATGACAGAA 
CACTTGAAAT 
ATTACAATAG 
AGAGGGAGAC 
CACACACAAG 



ACTCCTTTCC 
CATGCCATAG 
AATGAGTGTC 
TGGGACGGTC 
ATGGTCTGAG 
AAGGGCTAGG 
TGAAGGAGAT 
TAGGCCTGTC 
CAGGCTGGCC 
CTCATAATAT 
CCAGTACTGA 
GCTAATTTCT 
GTGGGTCTCC 
AAGCCACAGT 
GTGTTAACCA 
ATCCCAGGTT 
GAGTGCAAGG 
TGGAGTGGGA 
ACCCCAGTCA 
GTGTGCTCCC 
AAATGCAACA 
GGCCTGGGGG 
TTAAAGGGAC 
ATGTGGCCTT 
TAACCCAACA 
CTATAGAGAG 
AGTCTGGGGA 
AGAAGGAATC 
CGATAAATAT 
AACTAATACA 
GAAACCCTAA 
TTAGACAAAC 
CAACAGGCCA 
ATGGATCACC 
CTACTAAAAA 
GGGAGGCTGA 
CCACTGTACT 
AATAGAGCCA 
GAGGCTCGCT 
TTCTAAAAAT 
TGGAGGCTGA 
TCCAGCCTGG 
TTCTCTCCTC 
CACATAGTGA 
GAATTGGCCA 
TTTTTTTAGC 
AGAAGGGATG 
TTAGTCTGAG 
GTTTGCAAGA 
ATCTTTCCAC 
GTGACTGGTG 
CTACATGTAG 
AGGACTCACC 
AGGCAGATGA 



CTACCTCTCC 
CCACCCTGTG 
CCATTCAGGT 
ACATTCACCC 
TCCAGGCCTT 
CACGGCAGCA 
TCTGAAGAGC 
TTCCTCTCTC 
TCAGGCACTT 
GGGTTGAAGT 
AATGGGAAAA 
TCAGTGCCTG 
AACCCCATGA 
GGGTGTGTGT 
ACTCAATTAG 
CTTGCCTTGG 
TTTTATTAAG 
AAGTTTTCCC 
AATTCCGCCT 
CTGGACGTCC 
TTTGGGCAGG 
TGGAGCCCTA 
CATGCCCTTC 
ATTTGGAAAT 
TAAAGGGTGT 
AAAATGTGGT 
TACCAGAAGC 
AAACCTGATG 
CTGTTGTTAA 
GTAGGTACTA 
CCCCCAGTGT 
TCATCAGGAT 
GGTGCAGTGG 
TGAGGTTGGG 
TACAAAAATT 
GGCAGGAGAA 
CTAGCCTGGG 
GGTGCTGCAG 
TTAGCCCAGG 
TTAAAAAATG 
GGTGGGAGGA 
GCTACACGAG 
CTTCTCTCCA 
GAATGCTGCT 
GCACCTGGAT 
GACTAAGTCT 
AATTATGGAG 
CAAAATTCGA 
GCTAGAGAAT 
ACTGTTCAAA 
TCTGAGGAGC 
CTAGGGGCTA 
AAGGTGGATG 
TACAACACAT 
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CCTTCCCAAA 
TACCAATGTG 
TACTTAAAAA 
AAACTTTTAA 
CCCATAGAAC 
TTTTGCAATT 
TACTGGACTT 
CTTAAAAATC 
ATGCTAATCA 
GCTATAAGTT 
CGTATAAATT 
CCAGACTTTA 
GTACAGAGGC 
CTGTTCACAA 
CAGAGTTTCA 
AGTCAAAAAC 
CAATAACTCT 
ATAATTTTGG 
CCTTATTTCA 
GGATACTCCA 
AGCTGTAAAG 
ATGCTAGTAG 
GGGGACGCAG 
CTGTCTCAAA 
AAAAAGGAAA 
AACAATACCC 
TCCTGTCTCT 
TCAAGTTAGG 
AATATTACTC 
TGAACTACCT 
GGAATGCTGC 
ACTCTGTTTA 
ATAAAAATGG 
GGATAATTTA 
CTGTAATCCC 
CCGAAATGGT 
ATGCCTGTAG 
CGGAGGTTGC 
TCCGTCTAAA 
TGCATTTTAT 
AAGAGTAAGT 
AGACATAATT 
TGGCCGTTTT 
TTTTTTGTTG 
GGTGTGATCT 
GCCTCCTGAG 
TTAGTAGATA 
TCGGCCCGCC 
CCGTTTTTTT 
ACCAGCGTAG 
CCACATGAAT 
AGGTCCTGGG 
TTCTCAGTAC 
GTATGTGAAT 



CCTGGAGATA 
CATTTTTATG 
ACAACATTCA 
ATTCAATTTA 
CCATAGAACA 
TGACGAACTT 
TTAAAGTATC 
TTAAGACAAT 
ACTTAGATTG 
TTCATGAGTT 
ATTTAACATA 
TGCTGCAGCA 
AAGAGTCAAG 
TCTTCAAAGT 
AAACAAGACA 
ACAATTGACA 
AATTATAACT 
AACACATATT 
ACAATCCTAT 
GGGGCCCTCT 
GCTCAAAACA 
CACCTCTCAG 
TGAGCTATGA 
AACAAAAACA 
AAAAAAGTAT 
CAAAATAAAG 
GAGTCCCATT 
TCATAGAAAT 
TAACTTTCCC 
TATTTGATCA 
ACAGAGAGGC 
TTAGCAATCC 
ACAATTTCCC 
TTAATATACA 
AGCACTTTGG 
GAAACCCCGT 
TCCCAGCTAC 
AGTGAGCCGA 
AATAATAATA 
CCTATTAATC 
TTTCCCTTAG 
AAAGTGGCTT 
CTCCTTTGAT 
TTGTTGTTTG 
CCGCTCACTG 
CAGCTGGGAC 
CGGGGTTTCA 
TCAGCCTCCC 
TTTTTTGGTT 
TTATCATTTC 
TTCTTGTCTA 
AGCCAGTCTC 
TGTCACTGTC 
GAGTTTTGAA 



AGCTCACCCC 
TCCTTTTCCA 
ATTCATTATT 
CAATGCTTAC 
AATAATCTAC 
TAAGAAGAAA 
CAATTGACTA 
ACTTAATATG 
GTATAAAGTT 
GAGTTTTTAC 
AAATATTGTT 
CCTTTGCCTG 
AAGATTAGTT 
TATCAGAAAC 
ACATTTGTCT 
AAGAAATTTA 
GATGACACAA 
CACAGTTTTC 
ATAACTAAAC 
GTAGCATCCA 
CTTAATGAAC 
TTGTGGCTAA 
TTATGCCACT 
AAAAACAAAT 
GCAGTCTTTG 
AC CGCAGAAG 
CTCCCCGGAG 
CAAAACACCT 
TCTGTTTTTC 
TAGATCACCA 
CAAGAAGAAT 
TATTTCTACA 
CTGTACATGT 
CATTAATAAA 
GAGCTGAGGC 
CTCTATTAAA 
TGGGGAGGCT 
GATTGCGCCA 
ATAATAATAA 
TTCCTCTTGT 
CCCCTACAGG 
CTCCATGAAG 
CTCTACTTCA 
GAGACGGAGT 
CAAGCTCCGC 
TACAGGCACC 
CTTTGTTAAC 
AAAGTGCTGG 
TTTGCATGTC 
TACTGCTTAA 
TTTGACAATT 
TGTACTTGGC 
AATTGTGGGT 
ATCTGCTGAG 



ACAATCCCGC 
TACAGAAAGA 
ATGACAAAAT 
TATTGGCATT 
CAAATTTTTA 
ACTTATAAAT 
ATGAACAAAA 
GCAAATCTTA 
GAGTTAAAAA 



AATCACTTGA 
ACAAAACCTC 
AGTTCTTGTC 
TTCCAATAGT 
CTGCAATTGA 
ATGAATGTTA 
GTCACCTCTG 
ACTCAGATAT 
ACTGAAATAT 
GTGTCAAATG 
AAAGTTAGGG 
CTCTAGTCAT 
GCTGGGAGGA 
GCACTCCAGC 
TGCCTATGCT 
TAGGTCCTTG 
CCAAAGTTTT 
TCTAGCCATA 
TTTCCCCAGA 
TGTGTAAAAA 
GACCGCATTC 
CTAGACAGAC 
CGGCGGCCCA 
TAATACACAT 
TTGGATGCAG 
GGGCAGACCA 
AATACAAAAG 
GAGGCAGGAG 
CTGCACTCCA 
TAATAATAAT 
CGGTGGTTTT 
TTCTTATGTT 
ATTATTTCTG 
CACTGACCCA 
CTTGCTCTGT 
CTCCCGGATT 
CACCACCAAG 
CAGGATGGTC 
GATTACAGGA 
TTCTCCCTTT 
TAATTGTTTT 
TATTCTCTTT 
TGCTCCAGGG 
AATAATTATT 
TAATACAGTG 



CGCTGAAATA 
TCATTCAACA 
TAAATTAATA 
TATTAATCTA 
ACATTCATTT 
TGCAATTTTT 
CTGCTCCAAA 
ACTTCTTAAA 
TCACAGGATA 
AATGCTTAGA 
TGGAGTGTCA 
CTGCATCCAG 
TCAGCTCACC 
GGGTTATAAT 
AAATGTCCTA 
TGATTTACAA 
CAGAACTCTA 
GACCTGAAGA 
ATCCTGTTTA 
GTTAGCAAAG 
ATCTGTTCTC 
TCTCTTGAGC 
CTGGGCAACA 
GTGGTTATCT 
GGGTTTGTTG 
TCTCTGATCT 
GAAATGAGAA 
GCCCAGCCAT 
CTGGCCATAA 
CAGAGAGGAT 
AGGCCTTGCT 
TACTTTGTTG 
TAATAAATTG 
CCGGGTGCAA 
CGAGGTCAAG 
TTAGCTGGGC 
AATTGCTTGA 
GCCTGGTGAC 
AATAATAATA 
CAGCGACTCT 
TAATTTGTTA 
CATCCATTAT 
CATAAAACAT 
TGCCCAGGCT 
CACGCCATTC 
CCCGGCTAAT 
TCGATCTCCT 
GTGAGCCACT 
TACTGTAAAC 
GGGGAAGTGA 
AGGAATAGTA 
TCCTACTTCA 
TTTGTCCACC 
TCAACCCAGT 



GAGTTGATGT 
AGTACTATGG 
GCTCTTCCTT 
CCAATTTTTT 
TTGGCAAGGC 
AAATCTGACA 
TTTTTCAATT 
CTTTGTAAGA 
CATCATCTCA 
ATAGGAAATA 
GTTTCTCTGG 
GAAGAATTAG 
TAGTTAACTC 
CCATTCTTTG 
GGGTAGTCAC 
TAGCCTAACA 
GAAATCCCCT 
TCAAATATCA 
CCTCTCCTTT 
ACAATTTTGA 
TACTCACTAA 
CTAGAAGTTT 
ATGCAAAATC 
CACAATTAAT 
GAACTCAGAA 
TCTCCTGCCC 
TTCCTCTTCC 
AAAACCTAAA 
AGAAATTATC 
CCAGAAGGAA 
GGGTTTCCCT 
AATCTAAAAA 
GATATAAATT 
TGGCTCACGC 
ACCACCCTAG 
GTGGTGGCAC 
ACTCGGGAGG 
AGAGTGAGAC 
ATAAATTGGA 
TCAGAGGCCA 
CTCTCATTTA • 
TTGGTAAGAT 
CACTGCCTGT 
GGAGTGCAGT 
TCCTGCCTCA 
TTTTGTATTT 
GACCTCGTGA 
GCGCCCGGCC 
TATTTCCACT 
ATGCATCAAC 
TTAACTCCTA 
GTTTCCCAGC 
AAAAGACTCT 
TAATGATTTG 
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CCGGGCGGCT 

TGTCTGAAAG 

CAAAGCAAAC 

TAAAAGTTAA 

TGACAATTAG 

TGGGGGAGGG 

TTATGGGGTC 

TTGGGTTGCT 

ATTAATAACA 

TTTTATAATT 

CAGTATTTAT 

AGAAGCGATG 

AAGACCCTAT 

ATCTCTGATA 

AACACCACCA 

TTGTTAGGCT 
GATTTTTCTA 

TGCCTGTGCA 
ACAGGCATTA 
TAAAAATTAC 
AGGCGGGCAG 
GTCTCTATCA 
ACTCAGGAGG 
AGGAATATAC 
AACCGCATGA 
ATGTATTGAA 
GTTGGGGCAT 
TGATTTAGTT 
AGCACTTCTT 
GATTACAATG 
TGCCTGGTGC 
TGACAAACAA 
TTGCACACAA 
TTTTTGTACT 
ATCCTCATAA 
CTGAGACACG 
TTGAATTTGA 
TACTAAGCTG 
TGCTTCAAGT 
ATCCTTTGAT 
AGCAGGTTGA 
ATATATTGGC 
AGTTGGGACT 
GTCCATCATA 
ACATCCTCAC 
AATGCACACA 
CCAGCTCTAT 
AGAGGCAGCT 
CTCTTCGCAT 
AAAAAAAAAA 
GCCTCAAATG 
TAATTTAATC 
GCAGGGTAAC 
AGAACTGCTG 



TGATCAGGGG 
CACAAACAAC 
CTATGTTTTG 
AAAAAAGCTT 
ATATTTTCAA 
TTCTTATTCT 
TTGTTTGAGG 
CTTAGGCACA 
TTATTATTAC 
TTGCTTCCTG 
GTCTGTCATC 
GTCATTTTAC 
GTTTAACCTC 
TCTTTTGCAC 
TGAAGGTAGA 
TCTTGAAGAT 
GACCACTGAG 
ATCCATGCAG 
TAATTTCTGT 
CGGCCAGGTA 
ATCACCTGAG 
AAAATGTAAA 
CTGAGGCAGG 
TCTCTTTCAA 
CATAGGAAAT 
GGAGTGAAAA 
ATTTTAATTC 
GATACTTTAA 
GGAGAAGTCT 
GTGGTTGTCT 
AGAGGAAGGG 
AAGCTTAACA 
GATTGTAGGT 
TAAAATATGT 
TGACCCATGA 
AAAAGGTTTA 
ACTCAGACAT 
CCTCTGTATT 
AACCATGAAA 
AAGCAAACAT 
GATGAATTCT 
TAGGCACACC 
GGGTAGTTAT 
AGCTTGGATG 
ATCACAGTGA 
TATAATTGCT 
CTCTTATCAT 
AAGGGAAGCA 
CCAGCCCTCC 
AAAAGAAAGA 
AGAGGCTACT 
CTCACTGTAT 
CAAGCTCATG 
TCTTTCTGCT 



CTGTCCAACT 
ATCCTACATT 
AATTGTTATT 
TATATTTCAT 
TTTAATGAAA 
GTTGGACTTT 
TGTGTGTGTG 
TTGTAAAGTC 
AGCCTGATCA 
TCAGGCAAGA 
CTCAGTCATT 
TTCAAAAATG 
CACTCCCGGG 
AGCCACTATT 
GCCTGTCTGA 
GTTGATCAGT 
ACAAGTGTCT 
TCTCATGGCT 
CCACTGAAAA 
CTGTGGCTCA 
GTCAGGAATT 
AGTTAGCCAG 
AGGATCGTTT 
GAGTTCGTGG 
GCCTGTGACA 
CGCTTCCATC 
ATGCATTTTG 
TATGTGTGTG 
GAATTCTCAT 
CATAGAATGC 
TTCAGTTAAC 
ACAACACCAC 
AGGATGTTTT 
CAGAGGTTGT 
AACAGGTAGG 
TTAACTCACC 
TCCAGGTTCC 
TTTCCTTGAT 
AATATAAACA 
AATAAAAATT 
ATAGTAAAAA 
TGCCTGCTAT 
GTGAGTGTCA 
ATGGACAAGG 
GAATGAGTGT 
TGCACACACA 
TAGGCTTCTT 
CACATAATTA 
AAGTTAAGGA 
AACAGAAGGA 
GTGTGCTGAT 
TTCTGGGAGT 
AATGGAGAAA 
CTTCCACACT 



ACCGGCATTT 
GTAAATGCCT 
CTTCAGCAGT 
TTTCTGCCTA 
TTTTTTTTTA 
TACATAACCT 
TTTAAGGGAA 
ACACACCTGT 
CCATCATTAT 
GCCAATTTCA 
TTACTTCACT 
AAAAGAATTA 
TAAAATGGTC 
ACCTACCGTT 
ATTATTTTCT 
TGTTTGTGGA 
AAGACACTTG 
TCCCAGTGCC 
GGACAAAAAA 
CTCCTGTTAT 
CGATACCAGG 
GTGTGGTGGC 
GAGCCCTGGA 
TTTTGACTGC 
GAGGGGTAAG 
CCTCTACTTA 
TAGATAGAAA 
TTTAGGATGC 
TCTCCATTTC 
AGGGAGTCAG 
TGTCTGTATT 
CAACAACAGT 
AGAAAAGTTA 
TCTAAGAACT 
CTTATTATTG 
CAAAGTCACA 
AAGACAGTCT 
TACTTTGTAA 
ATCTATGTAT 
TGATATCAAT 
AGTGCAGAGT 
CAAAGGTATG 
TCAGAATTCT 
AGTGAGCTCC 
TCTAGACTGT 
CACATACACA 
GGGGCTAGTA 
GAAAGAATGA 
GAGTACCATC 
TATCATACAG 
CCCAATCCCA 
ATTATTCCCA 
CTGGGATTAA 
ACCAGCTCAG 



TGATTTGGAG 
TTGGCTACAG 
TCTGCTAGCC 
AACTCTTTAA 
GTTCACAGAT 
CCACTTTAGT 
TGTGGTTTAC 
ATTCTTATTG 
TGATATATCT 
GTGCTACCAT 
TGTTCTTAGC 
ATATTTTTAC 
TAGTCCCTCC 
TTCTAGATCC 
TGTCCCGTGA 
GTGAATGAAT 
TTCCTTCCCA 
TCAGAATTAT 
CTAAGTGTAT 
TCCAACATTT 
CTGGCTAACA 
TCGCACCTGT 
GGTTGAGGCT 
CACCTAGCGT 
GTGAGAGAGG 
CTAAATATAT 
AACAAAAGTT 
ATGATTTATA 
CTTATTGGCA 
AATGAAAATA 
AATATTACTG 
TGCAGAATTG 
TTATTTAATA 
ATTTAAATGT 
TCTCTTTACA 
CAGCTGGTAA 
AATTATTCTT 
AAGTATGAGG 
CAACTGAAGC 
CAAAACTTTC 
GCTGGAATAC 
CACACACCTT 
TTCCCACTTG 
CAGAACAGTG 
TTACACACCT 
CTCATCTCTT 
CCTAGGGCCT 
ACCAGCTTGT 
TTTCTTAGGG 
CAAGGATCTA 
GGAACTGTAT 
TTTTACAGAG 
ATATAAAGCT 
CTGTGCTCTC 



CGTCATCTAG 
AGATTGAAAC 
TTGAAAAATC 
AATTGCTAGT 
TAATACACAA 
GCAGTCTGCT 
AATCAAAATA 
ATACATAATG 
AAATAATGAA 
GTTTGTATAG 
CAAACGGCCG 
GTTTCCCTTA 
TTTTCATATC 
CTATTCTTCA 
ACTCAGTACA 
CAGCTAGCAT 
TGTTCTTGCC 
CCCCTGTCAA 
AGCTAGAAGT 
TGGGAGGCTG 
TGGCGACCCC 
GGCCCCAGCT 
GCAGAAAAAT 
ACATCAGAAA 
TTGATGAAGA 
TAGTTAAGTA 
TTATTCTGTT 
ATCAGTCTGC 
ACGTGAGAAT 
GTCCATATAA 
ATAACAG TCA 
AGCCACCAAT 
TATGTATATA 
TAACTCCTTA 
TGTGAGAACA 
AACGGCAAAA 
TTGACTAATA 
AAAATATAAG 
ATAATTACAA 
ATGTAATGTA 
CATGCTCCTA 
GGATACAGAA 
GGAAAGAATT 
ATGTGGGGAT 
ACCACTCCTA 
CTCTGGTGGT 
GTATCCTTTC 
TGGATTTGGT 
TCACCAAAGG 
ATGCAAATAT 
GCACATTATC 
AAGGAACTTG 
TCCTTGCTCC 
TACATGCAGG 
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CAGTTTTACA 
GGGAACTTTT 
CTATGGTATA 
ATAAATAAAT 
ATTAAATAAT 
TATTTTTTGG 
AAGAACCACT 
CTTGAAGGCA 
CAGTGGCTGT 
CTGTCCCACC 
TATTTCTCCT 
GTAAAAAACT 
TTTCTTCCAA 
AATATACGTT 
GTAAAAATGT 
TTCTGAAATG 
CCAGTTTCCT 
TTTTGCAATA 
TTATATTTAT 
AGTGTTGTGA 
TCAGCCTCAT 
CTATTTATTA 
TTTGAGGGCG 
AAATATTTAG 
GAGCAAAACT 
AACAGACAAG 
TTGGTCTCAA 
GTGGCTGTTA 
CAAAGTTCTG 
GAACCCTACT 
CTGAGGCAGG 
ACCCCTGTCT 
GAAAAACCAG 
TTGAATATAT 
CATTTTGCTC 
TGAAACAAAA 
ATATATTGCA 
ATCAGTTTTG 
TTTTTTTTGA 
GCTCACTGCA 
GCTGGGATTA 
GAATTTCACC 
AGCCTCCTAA 
AATAGACTTT 
AATACAGTTT 
GCTTTGTATC 
CGAAATTGAA 
TATATGTACT 
AAGGAGGAAG 
TGGTCAGGGA 
TGACTTATTA 
TTTCCATCTC 
TTTTAGATAG 
ACAGTGTTGT 



AGTTTCAGAT 
GGGTTTACTT 
TATGATAAAT 
ATTAATTTTA 
TAATACTCAG 
AGGCCTGATA 
ATTTTAGGCT 
AACGTTTAGC 
GTCTTTTCTA 
AGGTATAAGT 
GGTGTCCTGT 
AGACATTAAA 
TATAGTCATT 
TGCATCTTGT 
GCATATCCTC 
CTTTGACATC 
ATGTCACTTA 
TATTGTTCCT 
GTATTTATTT 
TCATAGCTCT 
GAGTAGAGTA 
TGCTCCTACT 
TGTTAGGGAA 
CCTCAGGTTT 
GTGGCTCTGG 
AGCCCCTACA 
GCAGATAGCA 
TTATTAGCTT 
GCCTATACAG 
CTAAGGCTGG 
AGGATCACTT 
CTATCAAAAA 
CTGTCACCCT 
CTTGGCTGTT 
TGCATTTTTA 
CTAGTCAACC 
TTACAATATT 
GGTGGGACGA 
AATAGAGTCT 
ACGTCCGCCT 
CAGATGCACG 
ATGTTGGTCA 
AGTGCTGGGA 
TTTTTTGTTG 
CCATGGAACA 
TTCCAGTTTT 
CAACCAAGTG 
ATATATAGTG 
CAGAATCACA 
AAGGATGTAT 
GGCAATACAA 
TATGACAAAA 
TCTGGACCCA 
CCAATAAGGT 



TAGCCTGGGA 
TCCATTTTTT 
ATATGGCTAC 
TAATATTTTA 
CTTTGTTTTC 
GTTTTTAGGA 
GTTGTCTTCT 
CAGCACATTA 
TCGATTTCTC 
TCTTGAGAGG 
GCTTAACAAG 
AAATAATGTC 
GTGTCAGGTT 
GCTTTATAAC 
ACAATTGACA 
ATTTGAAAGA 
TACAATTATA 
TTTGTAATAC 
TTCTGGACAG 
CTGCAACTTC 
GCGGGAACTA 
GTGTGCTTTA 
TACAGATGCA 
AATCTAATTG 
GTTATATGTT 
ATCTTATTTA 
ACACTAACAC 
CATTAATTGG 
GATTTAGTAA 
GCTTGGTGGT 
GGTGCCAAGA 
CAAAGAACTC 
CATTCCTTAC 
TGAGTCTCTC 
ACTTTTCTAC 
TATAATATTT 
TTAACTGTGT 
CCACATCCTT 
CGCTCTGTCA 
CCTGGGTTCA 
CCACCATGCC 
GGCTGGTCTT 
TTACAGGCGT 
TTGCTCACAG 
CCAACCAGAT 
TCAGAATGGC 
TCAAAGTACA 
AGCTTGTGTA 
ATTAGGTCAA 
ACTGGAAGAG 
TAATAACTTT 
TCCTTATTAA 
ATAAAATGTA 
ACCACTAGCT 



CTTCCAGGGT 
CTTCATACAT 
ATATGAACTA 
AAGGTTATCA 
CAAAGTGATA 
GTGTAAAGAA 
GTCTTATTTT 
ACATTTTATG 
ACACTGTATG 
ACACACTGCT 
TGCTCATTAA 
AACCAATCTA 
ATGTACTTAT 
TGCCTTCATA 
AATTCTTATC 
AGCTTGAAGA 
ATGGCAATTT 
TCTCTATGTA 
AGTCTTGCTC 
AAACTGCTGG 
CAGGCGCATG 
GTATATTTTC 
GTAACTTTGG 
TTGGCCATTT 
AAAAAAAAGT 
GG CTGAAAAT 
TTACTCTTTG 
TGAGTCAGGA 
TATTAGGTTA 
TCACACCTAT 
GTTTGAGACC 
TAATTGGCAT 
ACCTGTCCTA 
TCTAGCCCCA 
CAGGGTTTCC 
ATGATGTGTG 
CCTCAATTTG 
AATCTGAACT 
CCCAGGCTGG 
AGTGATTCTC 
GAGCTAATTT 
AAACTCCTGA 
GAGCCACCCC 
GCTTGTTCAA 
ATCAGGTTGC 
TTCTAAAGGT 
ACATTCAGGA 
TGTGTCAATG 
AGGAAGATAC 
GAAGGGAAAA 
TAGGGTCATT 
TTTATTAAAC 
AACATTAAGT 
ACACGTGATC 



TTTGAATGGG 
ATGTAATATA 
TATAATCACA 
AATAAATATT 
AATGCCTATA 
GTCCTGATAT 
CCCAGCTAGA 
TTTTTATTCT 
ATGGTTATAT 
AGGCTGATCT 
GTGTGTAAAA 
TTGAAATTTG 
TCTGATGAAG 
TAGACACAGA 
CTTTGAGGGT 
ATAAGATAGC 
CAAAATGTTA 
TTTATTTATA 
TGTTGCCCAG 
GCAAAAGTGA 
CCACTGCACC 
TGTTGTTTTC 
TCTCAGCCCT 
GCCTTCAAAG 
TTATGGGGCT 
ATCCTGGAGT 
AGGCAGGCAC 
AAAAACAGCT 
GCTACATCCA 
AATCTCAAAA 
AGCCTGAGCA 
AGTAGAAGGA 
ACAACTCCTC 
TTACTGCTGT 
AGACCCTGAA 
TGTAAATAAA 
TTTGTGGCTT 
TTCCCTTGGA 
AGTGCAGTGG 
CTGCCTCAGC 
TTGTATTTTT 
CCTCATGATC 
GCCCGGCCAG 
TCTTATTTCA 
TATGGAGTTG 
TCTGATTCAG 
AGTTAAAAAC 
AATGATTTAA 
GGGAGAATAA 
TCAGATATAA 
TTTTCTATAT 
TTCTACAAGT 
CAGAGTTACT 
ATTGACCATT 



TTAGGGAATG 
TAACATAAAT 
TATATGCATT 
AATATAAATA 
TTTAGCAAAA 
CTAAATGTTT 
CTGGTAAATA 
TTTGTGCTCT 
TTGTCTGTAT 
TAGTTTTTAT 
ACACAGCACA 
CATTTCCATG 
ACTATTGCCT 
TTGAGAAGGT 
AGGTTTGACT 
TGTTAATGAC 
GGTAAATATA 
TTTTTAAATT 
GTTAGAGTGA 
TCCTCCTGCC 
CAGCTAATCA 
TGCAACCCAT 
TGAGGTGAGG 
ATTGAAATAT 
GAAGCCAGGC 
CCCTGTATTG 
TGCCAGTGGG 
TTAAATCATT 
AAAGATGACA 
CTTTGGGAGG 
ACATAGTGAG 
AAAAGTGAAA 
TCACTATCCT 
TTGGACTTGA 
GAGTGTGGCA 
AGAATACACA 
TCTTGAGGAC 
GGTCATTCTT 
CGCAATCTCA 
CTTCCAAGTA 
AGAAGAGACG 
TGCCCACCTC 
AGGTCATTCT 
AAATTTGAGA 
ATAGTCAAAA 
AGCTCTTAGG 
ATGACTGACA 
TTCATTAATG 
AATATGTATT 
AGTTGTTTAA 
TAAGAATTCA 
GAATGTTTAC 
TTCACGTAGG 
TGGACTATAG 
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74521 CTAGACTGAT TTAAAATGTT CTAAAAGTGT AAAATACACA CCAGGTTCTG AAGATTTATC 

74 581 ATTTAAAAAA GAATGTCAAC TGTCTTTTTT TTTAGCTTAT TTATTATATG TTGAAGTGAT 
74641 AATAGTTTAG ATATATTAAG TTAAATAAAA TATCTTAAAA TTAATTTTAC TTGTTTCTTT 
74701 TCATTCTTTC AATGTGACCA CTAGAAATCT GGAAAGTATT TATGTGATTC ACATTCTATT 
74761 TTACTGTCTA GTATTGCCTT ACATCATCAG GTACCCCATA AGTAGGCTTT TTAGATAATT 
74821 CTCTAATATA GCTTGGAAGG ATATGGAGAA ATATTTTTGC GTTGCTTTTA AGTTTTGCAT 
74881 AACTTTTTCA ACACACTTTA TAAAGGATCT AGAAAAGGGT TGGTTACATG TTTCTCTGTC 
74941 TTCTGGCCTC CACCATGTTG CCAGGAGGTT GGGGACAAGA TTCTGGGTGG CTGGATGTCC 
75001 TAATGGCTTG AGGTCTGGAC TTGAGATTTG CATATAAAGA GATGTGATTA GATTGAGTCG 
75061 ACTAGAAAAA TCATATTAGA GAACTGAATC ACAGCGATTA AATTTACATG TCGATTTATA 
75121 AACCAGGACA CCAATTTATA GTGAAAGAAG GTCCAGTTAC CTGGTAATCA AGACGTTTCA 
75181 TAGCTATTTT CATGATGGAT ATACTTAGCT GAGTTTTAAA TGAGAAGGGG GTTCATTGCA 
75241 CATAGAATAA GATCTAAGTG AAATGTTTAT TTATTTTTTT TTTTTTTTGA CATGGAGTCT 
753 01 TGCTCTGTTG CCCAGGCTGG AGTGCAATGA GGCAATCTCG GCTTCTGGAG TGCAATGAGG 
75361 CAATCTCGGC TTCTGGAGTG CAACGAGGCA ATCTCGGCTC ACTGCAACCT CCACCTCCCG 
75421 GGTTCAAATG ATTCTCCTGC CTCAGTTTCC TGAGTAGCTG GGATTAGAGT TGCCTGCCAC 
75481 CACGCCAGGC TAATTTTTGT ATTTTTTTTA GTAGAGATGG GGTTTCACCA TGCTGGCCAG 
75541 GCTGGTCTCG AACTCCTGAC CTCAGGCGAT CTGCCCGCCT CAGCCTCCCA AAGTGCTAGG 
75601 ATTACAGGCG TGAGCCACCA AGCCTGGCCT AAGTGACATG TTCTTATATT GTTCCTTTCT 
75661 TTCTTTTTTT TTCGACTGAG TCTCACCCTG TTGCACAGGC TGGAGTGCAG TGGCGTCATT 
75721 TCGGCTCATT GCAACCTCTG CTTCCCGGGT TCAAGCGATT CCCTTGCCTC AGCCTCCTGA 
75781 GTGCCACCAC CCCCAGCTAA TTTTTGTACT TTTAGTAGAG ATGGTGTTTC ACCATGTCCG 
75841 CTAGGCTGAT CTCAAACTCC TGGCCTCAGG TGATCCGCCC CCGAGTCTCC CAAAGTGCTA 
75901 GGATTACAGG CGTGGGCCAC GGGGCCCAGC CTTATATTAT TTCTTTTACT ACAATATATT 

75 961 AGTATGATGC AGGTGCTTCA ATTGTTTATA CACTTTCCAT AATTTTGTAT AATTCTTATA 
76021 CCCTGTCACT CTGAGGAATA GCCGGTCTAA GTGTTTTTCC ACCACTGCTA ATTCATCCAT 
76081 CACTAATCTC ATTAGACTGT TAATTCCCAG AGGACATAAG CACACAAGCA GACAATGTTT 
76141 ACAAATGTTG GACAAATGTT ATTTAATAAA ACAATGGGGT CACCCTTAGT CTAAAAGATG 
76201 TTTCACTTTT CATTTGTCAT TGAACTCTTA TTTGTAGGTT CCCTTTTGAC TTTCCCACAA 
76261 TCTAAGGCTG TTCTCTTTAA CACATATTTT CATGAAAACA TATATTTGAG CAGAAATTGT 
76321 TGGGGAGTTG TAATATTACC TTTGTCCCTA AATATGAATC TATAATTATA TCAAATATAT 
76381 GGGCAGACAA TTTACTTTGC CTTTAATCTC AAGAAAAAAA TAGCAATTAC TTGGGGTCGG 
76441 AGAGTAAAAT AAGAAGTAGT GAACCTTAAA GTAGCAAACT TTAGAACAGA ATAGTTTCAG 
765 01 AGGGGATGAG AAGAGGTGAT TTTTCAGCTC ATCAACAACA GATCTTATAA TAAATTACAT 
76561 GTTCTGGTAC TTTTCTTGTC TTTCTGTGTT AAATTTTGCT ATTTAAAAAA ATAAATTTCA 
76621 AATACATTGT TCATCTTAAA AGTCAAGAGT GTGTTTTATT AAAGTCAGTT GCTTTATTTG 
76681 CAACTCAAAA GATATATTTG AGTTCCCAAC TGGAGATTGT CCTATATGGT AACTTGCGTA 
76741 AGGTATGGTT ACTGAAAGTA ACCTACAATT TTCATGGGCT GAAATTCATT TCTATATTGC 

76 801 AGCGTACAAA AATAAATAAA TAAAAAATGC TTGTTTTCTT TGAAAACATA TTATCTCAGT 
76861 GCCTCTAACT GCCAAATCTA TTGGCTTTTT TGCAGGCTTA AGGGCTCTCC CTTGTTCCTT 
76921 TATGATCTCT ATCTTGAGGG CCAGACCTCC TGCCTTACAC AACTCAGAGG GGGACCTCAG 
76981 AGCTCTTTAA AAAGAGCCCA ATTTCTCGCC TGTAGAGAAG TGAAAAGGAT GCCCCACCCC 
77041 CATCTATGAA AAGAGGGATT TGATAGTTTC AATGTCTTCA AATCAAAGAT TTAAGTCTGT 
77101 AGCCCCCCAC CACCCCGGAC CCTAGCAAGG CTCATGAACC CCCTCCCATC CCGCCCTAAT 
77161 TGCTTTGGAC TGGCCGTGGA ATCCTTGTCC CAGTCCACAG TTCCTGTGCG ACTGCACGAA 
77221 GAATTCACAG AGGACCTGTG TTACTTCCCT TGTGAAGAAA CAGAATTATC ATGAAAATTT 
77281 AGGTGGAAAC CATTTCGCTT TTTTCTTCAA AAATAAGGGA AGCATGTGCC CAACCACCCC 
77341 TGGGAAAAAG AACCTTCAGG GGCAAAGGAG CGAACAGGTA ATTTATAAGA AAAACAGAAA 
77401 GTGGTCTCTG ACTGCCCCAG ACTTCCTTCG GAGTTGGGGG AATTGGGGAC GCCTGGACGC 
77461 GTTGTTTTTG CGTTTGTGGA AAAAATAAAT GAAGAGCATG AAGCCCGAGG CTTCTGAGAT 
77521 CCTTTCCTGA CCAAACCCAA GTGATTTGGT GCGGGGAATT TTAATATTTT TCCCCTTTTG 
77581 TGAGGTGGAA CAAACACAAC TTGGGAGCAG CGCAGCGGCT CAGAGCCTGC CAGCCAGGCG 
77641 GGCGACCAGA GCACCAATCA GAGCGCGCCT GCGCTCTATA TATACAGCGG CCCTGCCCAG 
77701 ACGCTGCTTC ATCGGCGCTT TGCCACTTGT ACCCGAGTTT TTGATTCTCA ACATGTCCGA 
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80401 

80461 
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80581 
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80701 

80761 

80821 

80881 
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GACTGCTCCT 
GGCGGCCAAA 
CATCACCAAG 
AAAAGCGTTG 
TCTCAAGAGC 
CTCC7TTAAA 
GGGCGGAACC 
CGGCGCAACT 
GGCCACTGTA 
GAAAGCTGCC 
CAAGCCTAAG 
AGGCTCTTTT 
TGCCTTTTCT 
AGAGACGAAT 
TTTATTGCGG 
TGGCAGTCCC 
TTAGTCTAAA 
GGGTTGGTCC 
CGTTGCTTAT 
TATATTAATG 
CTGGAGGTAG 
CATTCCTCTG 
CCACAACAGC 
TGTGCCACAT 
CGTATCGAAA 
CCTGTTTAGA 
CAGTCCAAAT 
GATAAATGGG 
ATGGTGCTTC 
GGCACATTTT 
TTATTCTACC 
CGGATTCATC 
GCAGGGAAGA 
CATTACAACA 
CATATTTTTC 
GCCACCCTCC 
TCATGGACAG 
CCACAATTCT 
GCTGAGATGA 
TATCAAGCCT 
GGCATTCTGT 
AGTCATATAA 
CCCAGGCTGG 
AGCGATTCTC 
CGGCTAATTT 
GAACTCCTGA 
GTGAACCACC 
TTTATTATCA 
GCCATGTGTA 
CATCCACTGG 
TAATTAACTG 
ACTCTTCTTT 
AGATCTCTTC 
AAAGGCTGGG 



GCCGCTCCCG 

AAGGCTGGGG 

GCTGTGGCCG 

GCTGCCGCCG 

CTGGTGAGCA 

CTCAACAAGA 

AAACCTAAGA 

CCGAAGAAGA 

ACCAAGAAAG 

AAAAGTGCTG 

AAGGCGGCGC 

CAGAGCCACC 

TGTTCTGCCC 

TGGGTGACGG 

CTTCTAGGTC 

GCCCCAGGCC 

GGGATGTCCG 
GTTCTTCTAG 

TCTGTGTTTT 
TCTGGGATTT 
TGGCAACATC 
CTGTTAATTT 
GGCAATAGCC 
AACATGTAGC 
GCACAACTTT 
CAGACAGCAA 
GTTTCTATGC 
AGACATTTCT 
TGAATACAGA 
GGTGAGACCT 
ATCTCCACAA 
CCAAGAAAGA 
AGGAGAAAAC 
CGGTTTAACA 
CCAAGACCAT 
ACGCTCCTAT 
TTGGACTGTC 
TAAGGTAGAA 
TTATGTGACA 
AATGCTACTT 
CATCTCACAT 
TTATATTTAT 
AGTGCTGTGG 
CTGCCTCCGC 
TTGTATTTTT 
CCTCAGGTTA 
GTTCACAGAC 
GTTATTGCTA 
TATAGAAAAA 
GGGTGCAGTT 
AGATGTTGTA 
TTCAGAATTT 
CACCTCCTCC 
CGCGGTGGCT 



CTGCCGCGCC 
GTACGCCTCG 
CCTCTAAAGA 
GCTATGATGT 
AGGGCACTCT 
AGGCAGCCTC 
AGCCAGTTGG 
GCGCTAAGAA 
TGGCTAAGAG 
CTAAGGCTGT 
CCAAGAAGAA 
ACTGATCTCA 
TGTTACTTAA 
GGTTGGAGAG 
CCTGACCGGA 
TGTGAACGGC 
GATTGGACTA 
TACATGACTT 
TTGCTTTACT 
CGGACGCTTT 
CAGCCCTGGG 
CTCATTCCTG 
CTTCCTCCAC 
CTTCCGCTAA 
TAGCCAGCCA 
CATTTAAAAA 
AGAAAACAGT 
AATAAAGGCC 
AAGCCTAGCG 
AAATTATGGG 
TGATTAATAT 
GAAAGGGGAG 
ATTCTCCCAT 
TGGTGAACCC 
TTATGAACTT 
CAATTTTGGC 
TTAGGTTTCT 
TTGTATTGTT 
AATGGCAAGT 
CACAATGCCT 
CATCACAAGT 
ATTTATTTAT 
CACGTTCTCG 
CTCCCGAGTA 
AGTAGAGACG 
TCCGCCCACC 
TCAAATCATT 
ATCTCTTACA 
AACAGTGTAT 
TATTAAACAT 
ACGTGACTTT 
TCCTGGTTAT 
TGTTTCTCCA 
CACGCCTATA 



TCCTGCGGAG 
TAAGGCGTCC 
GCGTAGCGGA 
GGAGAAAAAC 
GGTGCAAACG 
CGGGGAAGCC 
GGCAGCCAAG 
AACACCGAAG 
CCCAAAGAAG, 
GAAGCCGAAG 
ATAGGCGAAC 
ATAAAAGAGC 
GGTTAGTCGT 
TGGCCGTGGT 
GGCTTTTCTC 
AGAAAAGACC 
AAAAATTTTC 
TCATTCTGTA 
GTGACTTAAA 
CCATGTTGTT 
AGGAGAGTGC 
TGGCAACGAA 
CCAAGGCAAT 
ACTGACAGGT 
TTTTGTCCTC 
TCGAAGTTCC 
ATTTGTACTA 
TTCGTTAATG 
TCTTATATTC 
GACTGGGGCT 
AGTGAGTTGA 
GGAGGCAAGC 
GGTTTAAGTA 
TCTATTTTGG 
TCATTTCTGC 
TGTTTTGTCA 
CAGGTTTCTA 
TTAAACATTG 
GTTCAACTAA 
ACTCCATTCA 
AAAACGGTAA 
TTATTTATGA 
GCTCACTGCA 
GCTGAGATTA 
GGGTTTCACT 
TCATCCTGCC 
TTTATTACAG 
GTGCCTGATT 
ATACGGTTCA 
GCATTTACAT 
AATAGCAGAT 
TCCATTTTTT 
TCTCAACATC 
ATCCCAGCTC 



AAGGCCCCTG 
GGTCCCCCGG 
GTTTCTCTGG 
AACAGCCGTA 
AAAGGCACCG 
AAGCCCAAGG 
AAGCCCAAGA 
AAAGCGAAGA 
GCCAAGGTTG 
GCCGCTAAGC 
GCCTACTTCT 
TGGATAATTT 
ATGGGAGTTA 
GAGGTTACAG 
GCTGGCGGAT 
GCAAAACAAG 
AAAAGTCCCG 
TTTAATTGGA 
AGTTTTGCCT 
GGTAGTCAAG 
GTGCAGGTAC 
"GGAATGCATT 
CGTGGACCTA 
TTGAGCGTAT 
GCATGACTAC 
TTTAAACGTA 
TTAACTATGA 
GTTCCCTCTG 
GCTTCTTTTA 
TCTGGAGATA 
TTTGTTAGTG 
AGAGAGACAG 
ATTTTGTGTT 
TGTAAGGTTT 
TTCCCCCTTC 
TAGGCTAATA 
TTTTGTTCCT 
TGTTGTGTGC 
TACCTAAATC 
CCGCACTTTA 
GCTATTTTGA 
GACGGAGTTT 
ACCTCCGCCT 
CAGGGGCCTG 
AAGTTGGCCA 
AAAGTGCTTA 
TATATTGTTA 
TATAAATTAA 
GTACTATCTG 
TAGTCTCCCC 
AGAGCTAATT 
ATTTTTCCAT 
AAACAATTAA 
TTTGGGAGGC 



TAAAGAAGAA 
TGTCAGAGCT 
CTGCTCTGAA 
TCAAACTTGG 
GTGCTTCTGG 
TTAAAAAGGC 
AGGCGGCTGG 
AGCCGGCCGC 
CGAAGCCCAA 
CCAAGGTTGT 
AAAACCCAAA 
CTTTACTATC 
CTGAGGTATC 
CATTTAAACC 
GGTTTTGGGA 
AGCCAGTTTC 
CCCTGCTCCC 
TGGTGGAAGA 
CTTTTCTCTT 
TTGATGTCTC 
CTTTGTCCTA 
TAAAAAACAG 
GGGAGTTTTT 
CGATTTTGAG 
GGTTGCTTAT 
TTTTGTTTGG 
AGAGTGTATG 
TTTGACATCC 
AAATCTGGTG 
AGCTGCTCAA 
ATAGTGACCA 
GAAGACAGAG 
GTTAATTTTA 
AACATATGGA 
TTCCTCCCGT 
CGCTATAATT 
TTAGTCATTC 
TATCCTCAAT 
TGTAGTATCT 
TCTCATTACT 
GAGAGATCAC 
CCCTCTGTCA 
CACGGGTTCA 
CCACCATGCC 
GGCTGGTCTC 
GATTACAGGC 
TAATTGTTGT 
ATTCATCATT 
TGGTTTCAGG 
TTTGGGAGAC 
TTCTCTCATT ■ 
ATGTATATTA 
AAAAAAAAAA 
CTAGGCGGGT 
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GGATCACGAG 
TAAAAGTATA 
GGCTGAGGCA 
CACTCCAGCC 
CTCACGCCTG 
AGTTCAGGAC 
GCTGGGCGTG 
CACTTGAACC 
GGGCGACAGA 
ATTGAACTTC 
TTCTTATCAT 
TTCTAATTTT 
TAATCCTAAC 
AACACATTCC 
TTAGATTCAG 
TCCATCACAA 
TTAAAATTGG 
ACTGTCAATG 
ATTTAACTGA 
ATTACATGCA 
CACAGACTTA 
TTTGGGAATG 
AAGATAGAAA 
ATATCTAGTG 
AGGGAAATTT 
TCCTACACCT 
ATAATTTGGG 
TTATATCATT 
ATTCAAATAA 
CAACTGATAA 
AACATAAAAA 
TTATTTATTT 
GATCTTGGTT 
CTGAGTAACT 
GTAGAGACGG 
CACCTACCTC 
TTATTCCAAA 
ACTCCTAGGC 
CATTGTGGTT 
CTAAAGGATA 
CTAGCATTTT 



TTATTTTCAA 

TCAGGGTGAA 

TGTGTGTGTG 

TTTCACATCA 

GATTCATCTT 

CACAGTTCTT 

ACCCAGGCTG 

AAGTGATTCT 

TCGGCTAATT 

CAAACTCCTG 

TGAGCCACTG 

TGATGAAAAC 

TACTTTGGGG 



GTCAGGAGTT 
AAAATTAGCC 
GAGAATTGCT 
TGGGTGACAC 
TAATTCCAGC 
CAGCCTGGCC 
GTGTCACACA 
CAGGAGGCAG 
GCCAGACTCT 
TGTGTTCCTT 
CTCCAAGAGT 
CCTTCAATGC 
CTCGAATGTC 
TTTAATTTAT 
TCTTGCATAT 
TTGTTCACAT 
CCATTTTAAG 
AGGACACATG 
CAAAGGACAG 
CAGAGTTCCC 
TACACCATTC 
TGACAAGGAA 
TAATTGTAGT 
ATAAGAATTG 
CTGTCACCTT 
GTTGATTTTC 
GGTGACATCC 
TTTGATTTTT 
TTCCAGAAAC 
TTCAACCATG 
CTAGAGGCTA 
ATTTTGAGAC 
CACTGCAGCC 
GGGATTACAG 
GGTTTCGCCA 
GGCCTCCCAA 
CTTTCATACA 
AAAGCTCTGG 
GAAATTCATA 
TCAGAAGAGA 
TTGAGCACTT 
ACACATTCTT 
GGACTAAAGC 
TGTGTGCATT 
AGGTAAACTT 
CAAGTTAGCC 
TTATTTTGAC 
CTGGGCAGTG 
CCTGCCTCAG 
TTTGTATTTT 
ACCTCATGAT 
CACCCGGCCT 
TACAACATTC 
TCATTTTAAG 



CAAGACCAGC 
AACCATGGTG 
TGAACCCGGG 
AGCGAGACTC 
ACTTTGGGAG 
ATGAAAATAC 
CCTGTAATCC 
AGGTTGCAGT 
CTCTCAAAAA 
TCTCCCTTAG 
TAGTCAGGAG 
CCTTTGGGGT 
TTCTGCAAAC 
AGAGTTAAAA 
GTTTTCTCAA 
AGCTTACTGG 
ATGAAAAAGA 
TTTTTCTGTA 
ATTAACATGC 
AAAGAAAAAA 
CAACAAAGGA 
ATAAATACAT 
AAGGTTTGTT 
CTCTCTTTTT 
CACAAAGGGA 
AATTGCCTTC 
TGATATTCTT 
AAATTAGTTT 
ACTGCTGATA 
AAAATTTATG 
CTTGTAATGC 
ATAGTCTCTC 
TCCACTTCCC 
GCACCTGACA 
TGTTTGCCAG 
AGTGCTAGGA 
CAGTGCTATC 
ATATTTTGGC 
CCAGAGATGA 
ATAGGGATTT 
ATTTACAATA 
GTCACAGCAC 
TTGGTGTCAT 
TTTTTTTAAA 
TGTTCCTCTA 
CTTCTTAATA 
TTTTTTTTTT 
GCGTGATCTC 
CCTCCTTAGT 
TATTAGAGAC 
CCGCCTGCCT 
TATTTTGCCT 
TTCACCAAAA 
ATTAGGTGTA 



CTCGCCAAGA 
GCAGGCGCCT 
AGGCGGAGGT 
CGTCATAAAA 
GCTGAGTCAG 
AGCCTGGCCA 
TAGCTACTCG 
GAGTTAAGAT 
ACTAAATAAA 
ATACTTTCAT 
AGGAATCAAC 
CTTAATCCAT 
ATGTTTCCAC 
ATTAGAAAAA 
TTTTGTTCAT 
CTTAGGTCTA 
TTCTTGCCTC 
CTCTTAGATT 
GAAAAAAAAA 
AAATTGAAAC 
AAGGGAGTTT 
GGGCAATAAA 
TTTGCAGAGT 
CCTGGTATAG 
AATTTGGGTA 
AGCTGAAAAT 
CAAAACTTAT 
TATAAAATAA 
AGCCAAAAAC 
ACATTGTTCT 
ATTATTC CAA 
TCTGTCACCC 
CGGTTCAAGC 
CCAAACCCGG 
GCTAGTCTCG 
TTACAGGCGT 
ATGGCTACAA 
TATATAAGCC 
ACAGGCCCAG 
AGGGTACAGT 
TGCCAAGCAC 
TTTGAAGTAA 
TAAGGATGTA 
TTTAAAGTCA 
AAGAGCTGGA 
GAACTGATGC 
TTTTTTTTTG 
GGCTCGCTGC 
AGCTGGGACC 
AGGGTTTCAC 
TGGCCTCTCA 
TCTTTAATCT 
ATCTTTGGGA 
TCTGCCTGGT 



TGGTGAAATC 
GTAATCCCGG 
TGCAGTGAGG 
AAAAAAGCCG 
GCAGATTACC 
TGAAAACACA 
GGAGGCTGAG 
GACGCCACTG 
TAAAAATAAA 
GGCTACCCAT 
CCAAGCAAAA 
TTGATTTATG 
AGATGAAACT 
TTTTCAATTC 
GCTCTTTAGT 
ATGAACCATT 
AATTTTACTT 
CACTAAGTAG 
GCATGCAATT 
CTTAAAAACG 
GCACTTCATG 
AACCATGGAA 
CATCTCAGTG 
CAGTTGGGGA 
AAGAGAAGAC 
AACTTTTATG 
ATTTAATTTC 
TTTTGAAAAA 
ATCAATGAAT 
TGTGTGATAA 
ACTTTCTGTT 
AGGTTGGAGT 
AATTCTCCTG 
CTAATTTTTT 
AACTCCTGAC 
GAGCCACCAT 
ATTGAAGTAT 
TGAGGGAAAT 
TGCAAGACAG 
GGCAACAACA 
TGTTGCTGAT 
GTGCCATTGT 
GCTAGTTAGC 
ATAAATTTTT 
GTCAAAATGT 
TTAATCCACA 
AGACGGAGTC 
AACCTCTGCC 
ACAGGCGCAT 
TATGTTGGCC 
AAGTGCTGGG 
CCATTTGAAC 
TTTAATTTCT 
TCTCAATTTG 



CCGTCTCTAC 
CTACTCGGGA 
CGAGACCTTG 
GAAGCAGTGG 
TGAGGTCAGG 
CAATAAATTA 
ACAGGAGAAT 
CACTCCATCT 
GTTATGGTAC 
TTAATTGATG 
ATAGCTGATT 
TACTTTCAAT 
CGTCAAATGA 
TATTTGGCCT 
TTTGTTTTAT 
CATTTGGAAA 
AGTTTTTGAA 
TGTCTTGCAA 
TTATTAGTAT 
CGGTTAGACT 
GGATGACGAA 
GATAAAATGA 
CCAACCTTCC 
CACTTTTACA 
AGAGACCTCT 
CCAAAGTAGA 
ACATTAGTAA 
CGGTAATAAT 
ATTGCATAAA 
AACTATGAGT 
TTTTATTTAT 
GCAATGGCGT 
CCTCAGCCTC 
TGTATTTTTA 
CTCAGTGATC 
GCCCGGCGCA 
CATATTATAC 
GTAGTAAGGA 
AATTACATCA 
GTTTTGGGAA 
TACTCTATAT 
CATTCCCACT 
TGTGTGTGTG 
ATTTGAAGAA 
ATCTTCAAAA 
GTTGTCAGCC 
TCTCACTGTC 
TCCCGGGTTC 
GCCATCGTGC 
AGGCTGATCT 
ATTACAGGTG 
ATGGACATAC 
TCAACCACTT 
ACACCCTTTC 
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TCTCTAAACA 
TACAGATCTG 
AGTTTTGCTA 
CCTCAGAAAC 
AGTGGAAAAA 
AAGGTATCTA 
CTGGTGTGTA 
TAATACTCAC 
AGACACAAGC 
TCTGACCTGT 
CTATTGTAAC 
GACAGGGGAA 
AATCAGTCTC 
CCAGTGAGTC 
GTCCTCTTGT 
TCCAGTTTAT 
CTCAAGCATT 
TAGAATCTTG 
ATTTGTTAGT 
GGGCTGTTAT 
TAATCCCAAA 
TAGGTAAGAT 
CACTGTCCAC 
AGGACACCAC 
CTATATATGA 
GGGTTATCTT 
GGTAGTAAGA 
ATGGAGACAG 
AGACCAACCA 
AGTTTATACT 
CTAAAAGAAG 
GAACTCACTT 
TAACCTTTCT 
AACAGTGGTC 
TTGAATAGAG 
AGATTTATCC 
AGGGGCCCTA 
TATTAAGGTA 
ATTAAAATCA 
AAGAGAAAGT 
ATGTGATCAT 
GGCCGGGCGC 
TCACGAGGTC 
AAATACAAAA 
CTGAGGCAGG 
CACTGCACTC 
TGGCTTCAAG 
GCAGGTCTAG 
GTGAGTGTCT 
TACAAAGTTT 
CAATCCTAAA 
GTTATGAAGA 
TTACCTATAG 
CAAAACTACA 



TGAATGAGTT 
TGGAATATGC 
GTTTTTGATC 
ATAGGGTCCA 
GAAATCTATT 
ATATTAAAAT 
TTTTACACTT 
ATATGGTTAG 
AAATACTTGC 
TCGTGAAACC 
CACCCAACGG 
TTGCAATAAG 
CTTGAGAATT 
GGGAGTGCTG 
GCTAAATCAG 
CTATATGGGT 
GATCTTAGGT 
TAGCTTCCAG 
CCTGCAAAAG 
TGTTTTTGTT 
CTCAGGAATG 
CTCTTTCACT 
TTCACCTCAC 
CATGTCTCCT 
TAGTATGAAA 
ATTGATTAGA 
ATTTGGCGAA 
TGAGATAATT 
CTTCATGTGG 
ATTAATGAAT 
GTAATATTCA 
AACCACTGAA 
CCTTCTGGTA 
ATCCCAGACC 
TGGTAGGATG 
TGAAGCTAAT 
CAAATGTGTT 
CTTTAATCAC 
CCATTGTCTG 
TTAGTTGAAG 
GTGTACTTCA 
GGTGGCTCAC 
AGGAGATCGA 
AATTAGCCGG 
AGAATGGCAT 
CAACCTGGGA 
GAATGTTCCT 
ATAAAATGTT 
AGTGGAGAGT 
ACAACTTACA 
AACTTACTTG 
AAACATATTA 
ACAACATTAC 
GGAAAATATA 



CCAATCATAT 
CAAAAGTTAA 
TGTGAGTGAA 
CATATGTAAT 
TTAATGATTT 
ATTGAGTTTT 
AAAGCACATC 
TGGCAACTAT 
TCTGCAGCAG 
CAGGTAGTGT 
GCTCTCCTTG 
GAGCCAGCGC 
TGGGGACCAA 
CTTGGTTGGG 
TTCCTGGGAG 
GGTGCCAGCT 
TTTAAAATAG 
CTGCATGACT 
CAGTCTGGTC 
TAAAAGCAAA 
AAAAGGACAG 
GTAATAATTT 
ATCAGGCCTC 
TATCCACCCT 
CTATATATGA 
AGATATTAAA 
TTTAGTGAAA 
TGCCTTACAA 
TACTTGGCCC 
CCTTTGTTTC 
ATACAAATAA 
GTGTTCAAAT 
TTTCTTCTGA 
AGTAATTCTC 
CTGAAGAAGG 
GAAACACAAG 
AGTTTGTCTC 
GGATGGTTCA 
ATTATGTTAG 
ATGTATCTAG 
TTCGTTGCCA 
GCCTGTAATC 
GACCATCTTG 
GCGTGTTGGC 
GAACCTGGGA 
GACACAGCGA 
ACTGCTCACT 
ATGACATCTA 
AGAAACGTAT 
TGTGAAAGGA 
ACATTACCAA 
TCATCAGCCA 
AAAATAATTT 
CTTGGTAGTG 



TTATTCCTAA 
GGTGAAAAAT 
TATAACTATC 
TTTAAATTTT 
GAATCCAGTG 
TACTTTGTTA 
ACAGTTTGGA 
CTTGGACAGG 
AATCCAGATG 
CTCTAATACT 
TCCACTTCCT 
TACAGGAGAC 
AGTTTTTAAG 
TCAGAGATGA 
TGGTGGGGTG 
AATCCATTGT 
TGATTTTATC 
CCTAAACCAT 
CCCAGGGAGG 
AGTATAAACT 
CTTGGAGGTT 
TCTCAGTTAT 
TGACTAGAGG 
GAGGGATTCC 
GAAGGAAATT 
GTGTGACACT 
TTCCTGAGGC 
TGCTGAAGTA 
GTGGAAGACT 
ATTGTTATTT 
AGTTAAAACA 
TGCTTAAGGT 
GAACAGCACC 
AACTCACAGG 
CCACGTAAAA 
TGTAAGGGCC 
TCTCTCTCTC 
GGCTGCTATT 
AATCCTGATG 
TATGGGGATA 
GCCAATCTGA 
CTAGCACTTT 
GCTAACACGG 
GGGCGCCTGT 
GGCGGAGCTT 
GACTCCGTCT 
GGAATAACTC 
AGTATTCAAA 
AGAGCCAGAA 
GCTTAACAGA 
TAATGTGTTT 
CCCTGGAGGA 
CGATCTGAAG 
TCATATTCAG 



GCTATCACAC 
TAAATTATTA 
CTCTATGTCC 
TTAATAGGCA 
TAACCAAAAA 
TTTTACTAGG 
GTAGCCACAT 
ACAGCTTTTA 
TTTTCCAAGA 
TTATATTTTA 
AGACAGAGCT 
TAGAGTTTTA 
GATAATTTGA 
AATTATAGGG 
GGGGACTCAA 
GTTCAGGGTC 
CCCAGGAGCA 
AATTTATAAT 
AAAGGGGTTT 
AAGCTCCTCC 
AGACGTTAGA 
GATTTTTGCA 
ATTCCAACAA 
AATTTCTGAA 
ATATATGATA 
GCCTGGCAAT 
TGAACCTCCA 
AGAATTTTAC 
ATCAATGACA 
CCTTCTACAC 
GCTTGCAGAG 
TGACTTTATA 
ACCATCCAAA 
GTGCTCCTGC 
TTTGGCCAGT 
TGTACTTCCA 
TCTGATTTTA 
TTCACTCAAT 
AAAATATTTG 
ATAAGTTACG 
CGTAAGAATG 
GGGAGGCCGA 
TGAAACCCCG 
AGTCCCAGCT 
GCAGTGAGCC 
CAAAAAAAAA 
ACCTAAATTC 
ACACATTCCC 
GCTAGTCTGG 
GGATTTTCCA 
TGAAACTGAA 
AAGATTGAAT 
ATGGAATCAG 
AAGTTAATAA 



TCAAATATAC 
GGTATTTCAT 
TGGCACTGTT 
CATTTTAAAA 
TTGTTTCAAC 
TCTTTGAAAT 
TTCCAATGCT 
TACTCTGGGA 
AAACACTTTT 
TTGGTTTGTC 
GATTTATCAA 
TTATTACTCA 
TTGTAGGGGA 
AGCCTAAGCT 
GACCAGATAA 
TGCAAAATAG 
ATTTGAGGTT 
CTTGTGGCTA 
GTTTCTGAAA 
CAAAGTTAGT 
TGGAGTCGGT 
AAGGCAGTTT 
TACTTAGGCC 
ACAAAGGAAA 
ATCAATTTTA 
GATATCTGCT 
CTTCTGTAAA 
ACAATAATTC 
GTTAGTTTAT 
GTTGGCCTCT 
TTGTCCCAGG 
TTCTCCTGAC 
GCATCATGCA 
AGAGATGTAT 
GATCTGGGGC 
AGGTGCAGAG 
AAATTTGCAG 
CCTCCTTTTT 
GAATTTGAGT 
TGATTTGCAT 
GCTTCAAGGA 
GACGGGCGGA 
TTTCTACTAA 
ACTTGGGAGG 
GAGATCGCGC 
AAAAAAAGAA 
CTGGCAAGAT 
AGCACTGAGA 
AAAGAATTCT 
AATTTGAAAA 
ATACTTCTAA 
TCTATTTCCA 
AGTATTCAGT 
AATATGCTAT 



Figure 9 (Page 27 of 74) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



116/162 



87481 

87541 

87601 

87661 

87721 

87781 

B7841 

87901 

87961 

88021 

88081 

88141 

88201 

88261 

88321 

88381 

88441 

88501 

B8561 

88621 

B8681 

88741 

88801 

66861 

88921 

88981 

89041 

89101 

89161 

89221 

89281 

89341 

89401 

89461 

89521 

89581 

89641 

89701 

89761 

89821 

89881 

89941 

90001 

90061 

90121 

90181 

90241 

90301 

90361 

90421 

90481 

90541 

90601 

90661 



TTTCTGAATT 
ATTTTCCCAT 
TTATAGCTTT 
CCCCCTCAGT 
TTTAAATATT 
CTCAAATTCC 
GTATTTGTTA 
TCCACCTGAG 
GCTATTGTTT 
ATACTGCCAT 
GGAGCCTTTC 
TTTTAACAAG 
GCCTGATAGA 
GGAAACTAAC 
TCTGAAAAGA 
TGCGACGTGC 
ACAGATTGGT 
TCACCGCAGA 
GGCGCTGGAA 
CGCGCAAGGC 
GAGCGCTCTT 
ACTTGCGAGC 
TGAACTGAGA 
TTTCAAAAGT 
TACTAGACAA 
TATCTGCAGC 
TAACAAGGTA 
TTGAAATTCT 
CCACCAGAAA 
CATGCTATTT 
TCACATTCCC 
TTGTATACTC 
TAACACCATT 
CGAGATGGGA 
CCCCGTCTCG 
AAGTACACAG 
AGCCGATATC 
ACCAATCCAA 
GTAACTATGG 
GTGTTTACTT 
TATTGGACAG 
CGTCGTTTCG 
TGGCGCTGCC 
CGGCTCCGAC 
GGGAACTGCA 
GCAAATAGCG 
CCTGAACAAA 
ATTATGTTTG 
CCCACCCCTC 
TCATTTGAAT 
AGTGGAGAGT 
AGTCTGCTCC 
ATGGTAAGAA 
TGAAGCAGGT 



TTGTGATGGC 
TATAAATTTA 
TAATAGTTAA 
TAAGTATACT 
TATTAAAAGA 
CTGGATAAGG 
CATAAATCTA 
CTCTGACTCC 
TTGTGGACTT 
CAGAACTAAA 
AATATGTAAG 
GGCAAAACAG 
CTTGTCTGCA 
ATAGACAACC 
GCCTTTTCAA 
CAGCTGGATA 
GTCTTCGAAG 
GCTCTGGAAA 
CGGCAGCTTC 
CACGGTGCCC 
ACGGGCTGCT 
TGTTTGCTTC 
GCAAGTGGCC 
CCCGCGCGAT 
TCTTATTGGA 
GACAAATTGT 
TCTAAGGATT 
GCATTCCTGA 
CGTTCAGACT 
TGTTACTGGC 
ACCCTGCCTG 
TAAAATGTAC 
AGGCTAGGGG 
CGATCACTAG 
CATAAAAATA 
GAGGCTGTGG 
GCGCCGCTGC 
ACGAAAAGCA 
ACGGCTCTGA 
GACCTTGGCC 
GACGCCTCCC 
GATGGCCAGC 
CACCAGTTCT 
CGGCTCAAAA 
AGCCCGGTAG 
ACCTATGAAA 
TCCTTTTATA 
GTGCTTTATC 
AGTGAAACCG 
CTCAGGACTA 
GTTAGTAGCT 
AGCCCCTAAA 
GCGTAAGCGC 
CCACCCCGAC 



TGTTGTTTTG 
TATTTACAGT 
CAAGTTGTAA 
AATATATTTA 
GGACATGGGT 
ATGACCGCAT 
TTTAGTGGAC 
ACCTCCAGCA 
AGGTAACTAC 
ATTGTCACGT 
TATTTACACA 
TAACTCAGCT 
GTTACAAAAC 
GAATGGGTTA 
TGAGGAAGAA 
TCTTTGGGCA 
AGTCCCACCA 
CGCAGGTCGG 
CGGATCAGCA 
GGGCGGTAGC 
TTAGTAGCAA 
GTACGAGCCA 
TTTAAATATA 
AAAATCATTG 
TGAGTTGCCC 
CTAAAATTCT 
TTTAAAATGT 
CAGTCTCGCA 
CATGTCGGGA 
GAACAGCAAG 
TTCTCAAAAT 
TTTCTAAAGG 
GGCGGTGGCT 
AGGCCAGGAG 
CAAAAACTAG 
CATGAGAACC 
ACTCCAGCCT 
AAAAATACCC 
AAAATGCCGT 
TTATCGTGGC 
TGAGCAATAG 
TGCAGGTGGC 
AAGATCTCGG 
TAATTGCCCT 
CGACGAACAA 
GCAGCGGAAA 
CAAACTGCAA 
CAATAGAAAA 
TGTTTCTTTT 
TAAATACATG 
TTTCTATTCT 
AAGGGTTCTA 
AGCCGCAAGG 
ACCGGCATCT 



TCAGCTTTTA 
CTGCAGTACT 
AAGGTTTGAT 
GAAAATGGAT 
AAAAGAGCTT 
AATCTTTGGA 
TTTTGGCAGT 
GCCCAAAACC 
ACACACATTG 
GGATTAAAAG 
TATACATGCT 
TGTTTTCTCG 
TTGTGTGTAG 
CAACTGTTTT 
ACGGGCAGAC 
TGATGGTGAC 
GGTAGGCCTC 
TTTTGAAGTC 
GCTCGGTGGA 
GATGAGGTTT 
GCTGCTTGCG 
TTTGCAATGA 
GTGAGAAACA 
GCTGAAGAGT 
CACCGCCCAT 
AGTTCATCCA 
AAATTCCGAT 
AGTTATCAAT 
AATAACGCTT 
TTTCCTTGCC 
GTCTTATTTT 
AAGGTGTTAT 
CACGCCTGTA 
TTCAAGACAA 
CTGGGCGCGG 
GCGTGAAGCG 
GGGTGACAGA 
TAACAGAAGC 
TTCAAGTGTA 
TCTGTTATTT 
TGACGTTGCC 
GGGGGATGAT 
CGGCCAGGTA 
TTCGAAAAAG 
GTTTTTGCTT 
ACTGTGAAAG 
GGCTGCAATA 
AGATAACATA 
GTCCAATCAG 
GGCTCTGAAC 
GTTTAGGAAT 
AGAAGGCTAT 
AGAGCTATTC 
CATCCAAGGC 



TAAAATTGGA 
TTTGCATTTT 
CCCCAGAAAA 
GAAATCAGCA 
TGCAGTTGCC 
TGGTCATACG 
GTGTACTGAG 
AATACTGAAT 
TCTTTATGAT 
GAGTGACGGT 
AAAAAGACCC 
CAGTAAAACC 
TTATCACCTT 
TAAGTGAAAT 
TTATGCCCTT 
GCGTTTAGCG 
GCAAGCCTCC 
CTGGGCGATT 
CTTCTGGTAG 
CTTCACGCCA 
CGGAGCTTTG 
GAGCACACAC 
TTCTGATTGG 
GACCAGACTG 
CCTGTCCTTT 
GTCCCAAAGA 
TCAGTAAGTT 
GCTGGTGAAC 
ATATTCAGAG 
CTTTGTTTTC 
GGTTGGCCTT 
TTTCTCGAAA 
ATCCCAGCAT 
CCCTGGCTAA 
TAGCAGACGC 
GCGGGGT3GA 
GCTAGACTGT 
AAGTTATCAT 
AGCTACGTTT 
TGGCAACAGG 
CAGCTGCTTG 
GCTGCGGGTC 
CTGTAAGTAC 
ATGACGGACT 
TAGCTCCATT 
ACAAGCAAGC 
GGAAGCTATC 
AATTCCATAT 
AAGTGAGGAA 
TGTTCTCTGT 
AGCAATGCCT 
CACTAAGGCG 
TATCTATGTG 
CATGGGGATC 



ATTTGATTTT 
TAATTTTACA 
CCTTGATCTA 
TTTGAATATT 
ACCCTTCATT 
CAAGTCTTGT 
GCCAGTTTCT 
TTTGGGGTCA 
AGCTTTAATA 
GGTGTCCCCA 
CTAGGAATTT 
GGTTGAAAAG 
TATATCTCCT 
TGTGAGTGGC 
TCCCCACGGA 
TGAATAGCGC 
TGCAGCGCCA 
TCTCGCACCA 
CGACGGATTT 
CCGGTGGCCG 
CCGCCGGTAG 
AAAAGTGTAG 
TCCTGTAATA 
ATTGGTTCAT 
TCGTTTCAGT 
ACAGAGTGTA 
TGAGTGGGAC 
ACTCACTAAA 
AATGAGATTC 
TAAGTCCAAG 
AAGTTTCACT 
CTTAACTTTT 
TTTGGGAGGG 
AATGGTGAAA 
CTGTAATCCC 
GGTTGCAGTA 
CTCAAAACAA 
CCTTTCTTGT 
TCTGATTTGA 
ACGGCCTGAA 
TTGACCTCCT 
TTGTCACGTA 
ACTGGCGCAC 
CTGCCCTATT 
TTCCACGTCC 
TGGAATGGCG 
CTATTGGTCA 
TTGCATAAAC 
TCTTAAACCG 
ACTACTCTGT 
GAACCCTCTA 
CAGAAGAAGG 
TACAAGGTTC 
ATGAATTCCT 



Figure 9 (Page 2 8 of 74) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



117/162 



90721 

90781 

90841 

90901 

90961 

91021 

91081 

91141 

91201 

91261 

91321 

91381 

91441 

91501 

91561 

91621 

91681 

91741 

91801 

91861 

91921 

91)981 

92041 

92101 

92161 

92221 

92281 

92341 

92401 

92461 

92521 

92581 

92641 

92701 

92761 

92821 

92881 

92941 

93001 

93061 

93121 

93181 

93241 

93301 

93361 

93421 

93481 

93541 

93601 

93661 

93721 

93781 

93841 

93901 



TCGTCAACGA 
AGCGCTCGAC 
AGCTGGCTAA 
AATAAGTGCT 
TCACAAGGAG 
TCTAGAGGAT 
GGGGCCGTGC 
GATCACTTGA 
CTAAAATACA 
CTTTGTTCCC 
CTCCCTCCCT 
TTCAGGGTAT 
AGGGGAGCAG 
GCCTATGCTG 
CGGGCGCCTT 
TATAATTGTT 
CCAGATTCCC 
GGGTTTGGAT 
CACTTGGAGA 
AAAAAAAAAA 
TCTGGGGCTG 
AAGCAGGAGG 
TGTAGGTGAC 
TTTTTGTTTT 
CTCGACTTCC 
GAAATGCACC 
TTGTTGCCCA 
CCAAAGTCCT 
ATCTTTTAAA 
TTGGGAGGCC 
TTAGTGAGAC 
TTCTGTAGTC 
GAGGTTGCAG 
TCTCAAAAAT 
TAAGTGAAGC 
TTGGCTTAAA 
TACTAAGGAA 
TGAGACCTGG 
GCCTGTAATC 
AGACCAGCCT 
GCATATGCCT 
GAGGCAGAGG 
ATGCACCCAC 
GGTGGCTCAC 
AGGAGATGGA 
TAATTAGCTG 
GAGAATGGCG 
CCAGCCTGGG 
AAATATGAAG 
TGCCTGCCTT 
AGGGTTTCTG 
CCCATCCAGA 
CCGCAGTGCA 
CCATAGACAG 



CATCTTCGAG 
CATCACCTCC 
GCATGCTGTG 
TATGTAAGCA 
AGCTATAACC 
CAACTGGAAT 
GCGGTGCCTC 
GGTCGGGAGT 
AATGATAGAC 
CCTGGGTAAG 
GGTCTAGTAC 
AGTGTTCCTG 
AAAAGTCTAA 
TAAATTCTTA 
TATACGGAAT 
TATCAATGAC 
ATTTCCTAAG 
TTTGTGCCCT 
GGGAAATCTT 
AAAAAAGGCA 
GGGCTGGGGG 
GGGTGGGGGA 
ATACAGCAGT 
GAGAAAGGGC 
CCAGCTCAAG 
ACCATACCCA 
GGCTGGTCAA 
GGGATTATAG 
AGAGGTTCTG 
AAGGTGGGAG 
CTTTTGTCTC 
CCAAGTACTG 
TAAGCTGTGA 
AAAAAATAAA 
ACTTCCCATC 
AATCTACATT 
TTGAGGCTGC 
TAATATAAGC 
CCAGCACTTT 
GAGCAACATG 
ATAGTTCCAG 
TTGCAGCAAG 
GCCCTAAAAA 
GCCTGTAATC 
GACCATCCTG 
GGCGTGATGG 
TGAACGCGGG 
TGACAGAGCG 
TTTTGAAGCA 
CTTCCTTTGT 
TACTATAGTC 
CCCCAAGAGA 
AAGTAAATGC 
AGCAGGACAT 



CGCATCGCGG 
AGGGAGATTC 
TCCGAGGGCA 
CTTCCAAACC 
ACAATTTCTT 
GTTAGCGAAG 
TTGCCTTTAA 
TCGAGACTAG 
GGTCGTGATG 
CCTTCGGGTA 
AGGAAACTTC 
TGGGGGTCAT 
GCGACAAAAG 
CTTCAAGTAT 
ATTTCCCGCT 
AACAGCTATG 
CCACTTAACG 
TCCCCATCTG 
TTTCGAGAAG 
GGAAGAGCAC 
AAGAAATGCA 
ATCGGAGGGG 
GTCTTTGGAT 
CTTTCTCTGT 
CGATCCTCTT 
GTTAATTTTT 
GCGAACTCCT 
GAATGAGTCA 
GGCCGGGTGT 
GATCACTTGA 
CACCAAAAAT 
GGGAGGCTGA 
CGGCACAACT 
AAAAAATCTG 
CTAGTACTGT 
CTTTTTTTAA 
AGTTTAAGAA 
ATTTTCAAAA 
GGGAGACCTA 
GCGAAATCCA 
CTACTATAGA 
CCAAGATCGC 
AAAGCATGAC 
CCAGCACTTT 
CTTAACACGA 
TGGGCGCCTG 
AGGCGGAGCT 
AGACTCCGTC 
GAAATTATTT 
TACAGAACTC 
CCTTCTGTGG 
GGGTTCTTGG 
AAGTTTACTA 
TCCCGAAAGT 



GCGAGGCTTC 
AGACGGCTGT 
CTAAGGCAGT 
CAAAGGCTCT 
AAGGTGGTGC 
ACAAGTTTTA 
TCCCGGCAAT 
CCCGGCCAAC 
GCGCTCTTTC 
CTATGTATAA 
CCTTTCTGGA 
TAGCCGTTAA 
GGCATGTAGG 
TGAGGAAACA 
CCACAAAATG 
TAGTTTACAT 
TTCTGATTTC 
GCGCCACTGC 
TCCAGGACGC 
TAGTTGAGGA 
AGAAGAAAAG 
AGTATTTTCA 
GAAGAAATAA 
CGGCCAGGCG 
ACTTCAGCCC 
TAATTTTTTG 
GGGCTCAAAT 
CCGCGCCCGG 
GGTGCAGCTC 
GCCCAGGAGC 
TTAAAAAATT 
AGTGGGAGGA 
GCACTCCAGT 
GATGCCACAC 
ATATGCAAAC 
TTATAAAACT 
GCTGATATTT 
TGAACTTTTG 
GTCAGGCAGA 
GTCTCTACAA 
GGCTGAGGTG 
GCCGCCACAG 
TCATTAAAAA 
GGGAGGCCGA 
TGAAACCCCG 
TAGTCCCAGC 
TGCAGTGAGC 
TCAAAAAAAA 
TGTCGTATGT 
CAACACTTAC 
TGGCCAGAAA 
ATCCCGCGCA 
AGAAAGTAAA 
AAGAGGAGGA 



TCGCCTGGCT 
GCGCCTGCTG 
TACCAAGTAC 
TTTCAGAGCC 
TGCTGCTATT 
GAGCCAAGGT 
TTGGGAGGCC 
ATGGCGAAAG 
TCATCTGTCT 
TTCCTTTGAT 
TAATGAAGCA 
CTTCTTGTGA 
GATATTTGCT 
ATAAGCGAAG 
AAATCGCAGT 
ATTTCATGCA 
CAGCTCTGCG 
AAAGCTTACT 
CAAAAACAAT 
GGAGGACTCA 
ACACTTGTTG 
GCGAATTTAT 
AGTTTCTCAA 
CCATCATAGC 
CTTGAGTGGC 
TGGAGGCAAA 
GATCCTCCCG 
CCCAGATTTA 
ACGCCTGTAA 
TCAAGACCAG 
AACCAGGCCT 
TCATTTGAGC 
CTGGGTGAGG 
AAAATGTCAG 
TGCCGTTGTG 
ACCACATCCC 
AGGATCTATC 
GGCCAGGTGA 
TCACTTGAGC 
AAAATTAGCA 
GGAGGATTAC 
CCTGAGCGAC 
AAAAAAATTT 
GGCGGGCGGA 
TCTCTACTAA 
TACTCGGGAG 
CGAGATCGCG 
AAAAAAAAAA 
TCTTTCATAA 
CCAAAGGTAG 
TATGTTACAG 
AGAAAGAGTT 
GTGGTGAAAC 
AGGCATCCAC 



CACTACAATA 
CTGCCTGGGG 
ACTAGCTCTA 
ACCTACTTTG 
CTGTTTCAGT 
TAACTTGGAC 
GAGGCGGGCG 
CCCGTCTCTA 
TAGCAAACTT 
AAGGTCACTA 
GGTAATGGAA 
GATGCGGGGG 
CCTGCAGCTT 
TCTGATTTCC 
AGTTTTGAGT 
TCCCAGAAAT 
AGATACAAAA 
AGGAGGGCCC 
ATAGCTAAAA 
ATGGGCCAAT 
ACTGCACAGT 
GGGCATTATA 
ACAGTTCTTG 
TCACTGCAAC 
TGGGACTAGA 
GGGTCTTACT 
CCTTGGCCTC 
ATTTTTAAGA 
TACCAGCATT 
TCTGGGCAAC 
GGTGGCACAT 
CTGGAAGGTG 
ACAGACCCTG 
TGAACAACTG 
AAAGTGACGC 
CCAAAAACAT 
TCCGGAGAAG 
GGTGTGTCAT 
TCACAATTCG 
GGGCGTGGTG 
TTGAGCCCGG 
AGAATGAGAT 
AGCCGGTCGC 
TCACGAGGTC 
AAATACAAAA 
GCTGAGGCAG 
CCACGGCACT 
AAAATTAAAA 
ATTTTTTGCC 
CTGTTGGGTC 
GAAAGAGGTC 
CAGGGTGAGT 
GACAACTACT 
CCTAGGTACA 
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ATACTTGTAT 
TTAGTTACTA 
TTGTTCTCCA 
TATTTATTTG 
ATGCAGCCCA 
GGTTCAAGTA 
TTTATTTTTT 
* CCCATAATTG 
GCTGTCTCTG 
GTCCCCCTCA 
CTATGCATTT 
CATTTCCGAA 
CTTGCTAATC 
TGCAATGGCG 
GCCTCGCCCT 
GTATTTTTAG 
TCGTGATCCG 
CCGACCAATC 
AACAGAATTT 
TTTTGGTGAC 
AATCTTCTAT 
GATTCTAATG 
ATTCACTGAC 
ATTTCTCTAC 
TGATATACAT 
CACTCTCACC 
GTTAGTCTGT 
GACAATTATT 
TACTCATGAT 
GATAGCTCCA 
TCTAAGATTT 
GTGCAATGGT 
TGCCTCAGCC 
GTATTTTTAT 
TTGGCTTAAA 
TACTAAGGAA 
TGAGACCTGG 
GCCTGTAATC 
AGACCAGCCT 
GCATATGCCT 
GAGGCAGAGG 
ATGCACCCAC 
GGTGGCTCAC 
AGGAGATGGA 
TAATTAGCTG 
GAGAATGGCG 
CCAGCCTGGG 
AAATATGAAG 
TGCCTGCCTT 
AGGGTTTCTG 
CCCATCCAGA 
CCGCAGTGCA 
CCATAGACAG 
ATACTTGTAT 



ATATGGGGAG 
TATTTTGCAA 
GATATAGGGA 
TTCCCTTAAC 
GAAAGTCTCA 
ACTCTGACAC 
ATTTTTGAAA 
ATAAGCCAAA 
CTGATACTCG 
GTTTATTACC 
CACAAAACTT 
GGGTCCCATG 
TTTTTTTTTG 
CGATCTCGGC 
CCCGAGTAGC 
TAGAGACAGG 
CCCGCCTCGT 
TGTCTTTTTG 
TCTTTTCCCC 
CAATCTTACA 
AAGTGAGATT 
ATTATTTTCA 
CTTCGCTTTT 
ACACAAGATT 
ATTTTGATTT 
AACAGGGTGT 
TCAAATTGCC 
GTTTGAGACT 
TCTTGCCCAT 
TGTATTAAAA 
TTTTTTTTTT 
GCGATCTCGG 
TCCCCAGTAA 
TAGAGATGAG 
AATCTACATT 
TTGAGGCTGC 
TAATATAAGC 
CCAGCACTTT 
GAGCAACATG 
ATAGTTCCAG 
TTGCAGCAAG 
GCCCTAAAAA 
GCCTGTAATC 
GACCATCCTG 
GGCGTGATGG 
TGAACGCGGG 
TGACAGAGCG 
TTTTGAAGCA 
CTTCCTTTGT 
TACTATAGTC 
CCCCAAGAGA 
AAGTAAATGC 
AGCAGGACAT 
ATATGGGGAG 



ATGTGCTCTG 
GAATCAACAT 
TATCTGGACA 
CGTAAACATC 
GCCTCATTTT 
TTTTCTTCTC 
TAAGAAATCA 
ACAAAAACCT 
GCTGATCGTT 
ATTAGATCAT 
GCCATAAAAA 
TAATATAAAA 
TTTTTTGAGA 
TCACTGCAAC 
TGGGACCACA 
GTTTCACCGT 
CCTGCCAAAG 
TAGAGGGGCC 
TACAATATAA 
GAAATTTTAT 
GTATTTCACT 
TTACTGCATT 
TAAAAATTTA 
GCTGTAAGGG 
TTAATACATG 
TTTTTCCTGA 
GACATGAACA 
GCACATTTTG 
TTTCTTTTGG 
GATTATTAAG 
TTTTTTGAGA 
CTCACCGCAA 
TTGGGACTAC 
GTTTCTCCAT 
CTTTTTTTAA 
AGTTTAAGAA 
ATTTTCAAAA 
GGGAGACCTA 
GCGAAATCCA 
CTACTATAGA 
CCAAGATCGC 
AAAGCATGAC 
CCAGCACTTT 
CTTAACACGA 
TGGGCGCCTG 
AGGCGGAGCT 
AGACTCCGTC 
GAAATTATTT 
TACAGAACTC 
CCTTCTGTGG 
GGGTTCTTGG 
AAGTTTACTA 
TCCCGAAAGT 
ATGTGCTCTG 



CTACAAGTTT 
TATTATCTTT 
CTCCTAAGTC 
TAGAAGCTAG 
CCTAGCCCTC 
TTTTTTTCTT 
AGAATACTTG 
AGGTCTTCTA 
AATAGGTAAT 
ATGCCTACTG 
TTCACAGGTT 
CTTATATTAA 
CTGAGCCTTG 
CTCCGCTTCC 
GATACGTGCC 
GTTGGCCAGG 
TGCTCGGATT 
TCAAGCATGA 
ACATTAATTG 
CTTGTGCAAG 
TTTCTAGTAT 
TCATTGTAGG 
AACCATGTTA 
CAAAAATAGA 
TTACCAAGTT 
CTTCCACAAA 
ATTAAATCTC 
ATAATAACAT 
GATGTTGCCT 
TTTGAGGGCT 
CGGAGTTTCA 
CCTCCGCCTC 
TGGCAAGCGC 
GTTGGTCAGA 
TTATAAAACT 
GCTGATATTT 
TGAACTTTTG 
GTCAGGCAGA 
GTCTCTACAA 
GGCTGAGGTG 
GCCGCCACAG 
TCATTAAAAA 
GGGAGGCCGA 
TGAAACCCCG 
TAGTCCCAGC 
TGCAGTGAGC 
TCAAAAAAAA 
TGTCGTATGT 
CAACACTTAC 
TGGCCAGAAA 
ATCCCGCGCA 
AGAAAGTAAA 
AAGAGGAGGA 
CTACAAGTTT 



GTGATAAAGG 
AAACAAAATT 
TGAGTCTGTT 
GAATGACTGA 
ACTCAAAATG 
CTTTTTTCCT 
ATGTTTCATC 
ACTCAAAACT 
TAACAAACAA 
TCAATCATAT 
TCCCGCTTCC 
ATACATTTGT 
CTCTGTCACC 
CAGGTTCAAG 
ACCATGCCCC 
ATGTTCTCAA 
ACAGACGTGA 
ACTTACTGAT 
TAATGTTATC 
TCTATGCAAA 
CCTTTTAAAT 
GAAGTAGATA 
CCATGAAAAT 
GATAGGAATC 
GCCTCCTGAA 
TGCTCTTGAA 
ATTGTTGTTT 
TTCTTCTATT 
TATGTACATT 
TATGATATGT 
CACTTGTTGC 
CAGGGTTCAA 
CACCACGCCT 
CTGGTCTCGA 
ACCACATCCC 
AGGATCTATC 
GGCCAGGTGA 
TCACTTGAGC 
AAAATTAGCA 
GGAGGATTAC 
CCTGAGCGAC 
AAAAAAATTT 
GGCGGGCGGA 
TCTCTACTAA 
TACTCGGGAG 
CGAGATCGCG 
AAAAAAAAAA 
TCTTTCATAA 
CCAAAGGTAG 
TATGTTACAG 
AGAAAGAGTT 
GTGGTGAAAC 
AGGCATCCAC 
GTGATAAAGG 



ATTAATTTTC 
AAGAATGCCT 
TAGTAAACAT 
CTTTCTGGGA 
GAGTTACTCT 
TCCTTTATTT 
TAAAACAATA 
AGGATGTTTT 
GCCTTGCTAT 
TAATCCACAA 
CTCGAGTTTT 
ATGCTTTTCT 
CAGGCTGGAG 
CGATTCTACT 
GCTAATTTTT 
TCTCCTTACC 
GCCACTGCAC 
GGGTGAGAAA 
ATTCAGGACA 
CCAATATGTA 
TAATAAAAGA 
ATTGCCCTTT 
GCTTTTCAGT 
ATGCATCCAT 
GGTCTGTTTA 
CAGTGGGTGT 
TTATTTTTAA 
ATGGTTTGAT 
ATTTTAAATA 
CAGTTACATT 
CCAGGCTGGA 
GCAATTCTCC 
GGCTAATTTT 
ACTGCCGACC 
CCAAAAACAT 
TCCGGAGAAG 
GGTGTGTCAT 
TCACAATTCG 
GGGCGTGGTG 
TTGAGCCCGG 
AGAATGAGAT 
AGCCGGTCGC 
TCACGAGGTC 
AAATACAAAA 
GCTGAGGCAG 
CCACGGCACT 
AAAATTAAAA 
ATTTTTTGCC 
CTGTTGGGTC 
GAAAGAGGTC 
CAGGGTGAGT 
GACAACTACT 
CCTAGGTACA 
ATTAATTTTC 
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97201 TTAGTTACTA TATTTTGCAA GAATCAACAT TATTATCTTT AAACAAAATT AAGAATGCCT 

97261 TTGTTCTCCA GATATAGGGA TATCTGGACA CTCCTAAGTC TGAGTCTGTT TAGTAAACAT 

97321 TATTTATTTG TTCCCTTAAC CGTAAACATC TAGAAGCTAG GAATGACTGA CTTTCTGGGA 

97381 ATGCAGCCCA GAAAGTCTCA GCCTCATTTT CCTAGCCCTC ACTCAAAATG GAGTTACTCT 

9.7441 GGTTCAAGTA ACTCTGACAC TTTTCTTCTC TTTTTTTCTT CTTTTTTCCT TCCTTTATTT 

97501 TTTATTTTTT ATTTTTGAAA TAAGAAATCA AGAATACTTG ATGTTTCATC TAAAACAATA 

97561 CCCATAATTG ATAAGCCAAA ACAAAAACCT AGGTCTTCTA ACTCAAAACT AGGATGTTTT 

97621 GCTGTCTCTG CTGATACTCG GCTGATCGTT AATAGGTAAT TAACAAACAA GCCTTGCTAT 

97681 GTCCCCCTCA GTTTATTACC ATTAGATCAT ATGCCTACTG TCAATCATAT TAATCCACAA 

97741 CTATGCATTT CACAAAACTT GCCATAAAAA TTCACAGGTT TCCCGCTTCC CTCGAGTTTT 

97801 CATTTCCGAA GGGTCCCATG TAATATAAAA CTTATATTAA ATACATTTGT ATGCTTTTCT 

97861 CTTGCTAATC TTTTTTTTTG TTTTTTGAGA CTGAGCCTTG CTCTGTCACC CAGGCTGGAG 

97921 TGCAATGGCG CGATCTCGGC TCACTGCAAC CTCCGCTTCC CAGGTTCAAG CGATTCTACT 

97981 GCCTCGCCCT CCCGAGTAGC TGGGACCACA GATACGTGCC ACCATGCCCC GCTAATTTTT 

98041 GTATTTTTAG TAGAGACAGG GTTTCACCGT GTTGGCCAGG ATGTTCTCAA TCTCCTTACC 

98101 TCGTGATCCG CCCGCCTCGT CCTGCCAAAG TGCTCGGATT ACAGACGTGA GCCACTGCAC 

98161 CCGACCAATC TGTCTTTTTG TAGAGGGGCC TCAAGCATGA ACTTACTGAT GGGTGAGAAA 

98221 AACAGAATTT TCTTTTCCCC TACAATATAA ACATTAATTG TAATGTTATC ATTCAGGACA 

98281 TTTTGGTGAC CAATCTTACA GAAATTTTAT CTTGTGCAAG TCTATGCAAA CCAATATGTA 

98341 AATCTTCTAT AAGTGAGATT GTATTTCACT TTTCTAGTAT CCTTTTAAAT TAATAAAAGA 

98401 GATTCTAATG ATTATTTTCA TTACTGCATT TCATTGTAGG GAAGTAGATA ATTGCCCTTT 

98461 ATTCACTGAC CTTCGCTTTT TAAAAATTTA AACCATGTTA CCATGAAAAT GCTTTTCAGT 

98 521 ATTTCTCTAC ACACAAGATT GCTGTAAGGG CAAAAATAGA GATAGGAATC ATGCATCCAT 

98581 TGATATACAT ATTTTGATTT TTAATACATG TTACCAAGTT GCCTCCTGAA GGTCTGTTTA 

98 641 CACTCTCACC AACAGGGTGT TTTTTCCTGA CTTCCACAAA TGCTCTTGAA CAGTGGGTGT 

98701 GTTAGTCTGT TCAAATTGCC GACATGAACA ATTAAATCTC ATTGTTGTTT TTATTTTTAA 

98761 GACAATTATT GTTTGAGACT GCACATTTTG ATAATAACAT TTCTTCTATT ATGGTTTGAT 

98B21 TACTCATGAT TCTTGCCCAT TTTCTTTTGG GATGTTGCCT TATGTACATT ATTTTAAATA 

98881 GATAGCTCCA TGTATTAAAA GATTATTAAG TTTGAGGGCT TATGATATGT CAGTTACATT 

98941 TCTAAGATTT TTTTTTTTTT TTTTTTGAGA CGGAGTTTCA CACTTGTTGC CCAGGCTGGA 

99001 GTGCAATGGT GCGATCTCGG CTCACCGCAA CCTCCGCCTC CAGGGTTCAA GCAATTCTCC 

99061 TGCCTCAGCC TCCCCAGTAA TTGGGACTAC TGGCAAGCGC CACCACGCCT GGCTAATTTT 

99121 GTATTTTTAT TAGAGATGAG GTTTCTCCAT GTTGGTCAGA CTGGTCTCGA ACTGCCGACC 

99181 TCAGGTGATC CACCCGCCTC GGCCTCCCAA AGTGCTGGGA TTACAGGTAT GAGCCACTGG 

99241 GCCCGGCCAC ATTTCTAAAT TCTTTATAAG TATAAATTCA TTCAATCTTC ACCAAAACTC 

99301 AATGAAGTGT GAGTACTATT ATTATCATTG TTTTACAGAT CAAAACAAGT AATACAGTCA 

99361 CTTACTGAGT TCTATACACC TGGTAATTTT TTTGTTTCGT TGTTCTATCA ATTATTGGGG 

99421 AAGGGGTGTT GAAATCTCTA CCTTTAAATC ATGTATGTGT CTATTTCTCC TTTCGGTTCT 

99481 ATCAGGTTTT GCTACACATA TTTTGCAGTT CTGTTATTTG GTGCATATAC ATTTAGAATT 

99541 GCTTGTTTTT CGTATTGGAT TGACCCTGTT ATCATTATGT AATATCCCTG TCTGTTCCTA 

99601 GTAATTTTCT TTGCTCTGAA ATATACTTAT CTGATATATC ATCCAAAAGA CCACCAGGAT 

99661 GGCTAAAGAG TAGAAAGGAG AGATTTACTG GCAATACTAA TTTGCAAGCC AGGAAGAGAT 

99721 GGTCCCAGAA CCTGCCAAAA TTACTCTCTC TTTGGGGAGA AGGAGCAGGT TGGTTATTTT 

99781 TATGCCTCAT AGGCTATATA TTACACAATA GAGTCATACA TATTTAGCAC GTTTGGGGGG 

99841 ACAGCTATAT ATATTATGAG GGGTGCCAAG TGCATTCACA ATGGATAAAC ACGTGTAATA 

99901 TACCTCCCAT GTTCACTTCG AGGTTAAATT TTGGTTAAAA TGAGGTAGAA TTTAGGTCTT 

99961 TACATCACAA GGTGAACTAT AGGAACAAAG TTTACGTGCT GCCTCTAGCA GCTGGCTGAA 

100021 AATGGCTTAA GGTCTACAAT TACGTGTAAG AATAGAATGT GTGTCAAGGC GGTCCTCTGT 

100081 CCAATCAGAG TTGTAGTGGA CTGGACTGTA AATCAGAGTT AGGAGGGCTT CTGATAGCTC 
100141 - CTATAGTTAA GGAATTTAGC AAGTGTGAGT TTTTTGGTAG TCTTTGGAAT TTAGGAATTT 

100201 GCCATGCCAG CCAAGCCATG AATGCTCTAC CAGTAGGTAA CTTTGTTTGC TTAATCTTAG 

100261 AGTCTGTCTT AGTTGGTATA GGGGCATCTA TTTTGGTCTT TCAGATCCCA GATATTATTA 

100321 ATACAGATAC TCTTGCAGTT TTGGGCTGAT GTTTATATGG CTTATCTTTT TTGCAGCCTT 

100381 TAATTTCAAC CTGCGTTATG TTTATATTTG AAGTGAGATT CTTGCAGACA GTGTACAGTT 
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103081 
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103201 

103261 
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103381 

103441 

103501 

103561 

103621 



GTTGTTTTTT 
GCACAGTCTC 
CCTCTTGAGC 
TAGTAGAGAC 
TCCGCCAGCC 
AAACTGTTTT 
AGGGCTTAAG 
GCCAGAAATC 
TAACGTGTCT 
GCTTATATAC 
TATGGGAGGC 
CTCAGAAAGA 
TCTCCTGGAT 
TCTCTACAGA 
TGGCCAAGCA 
TGATTTCCTT 
TACCCTTGCT 
CATGCCCTTA 
TAGCCAGTCT 
TCATCTTACA 
TGTTAGAGGT 
TTTGAGTATC 
TGTATTCAAT 
TATGGTTTGG 
AGTGAAGCCT 
TCAGGGTAAT 
GTGACACCTC 
CACCTTCCAC 
ACCTCCTGTA 
TCAGTTTCAG 
ATATTTACAG 
GCTAGTGGGC 
TTTTTTTATT 
GTTGGAGTTG 
CCCCAATATT 
AATTTACTGT 
ACATGCCAGG 
TTCTCATCCA 
TTTCTAGTCT 
ACATATCCAA 
ACTTTCTGAA 
AAGTATTTAT 
ACATTCTCCT 
TATTTTTCTC 
GTGCACCTCC 
TGCTATGCTC 
GCTGAACTAA 
ACATTGATCA 
TTGTATGCAG 
GTATAATCTC 
TACCTTTTCA 
AGATCATTAT 
TCAGAAGACT 
AGTTTCTCTG 



TTTTTTTTGA 
AGCTCACTGC 
AGCTGGGATT 
AGGATTCACC 
TCGGCCTACC 
TTTATGGGTG 
TTCATGAAGG 
ACTGACAAGG 
ATGTGGGAGC 
CATTTTTAGA 
AAAAGAGGTT 
ACAGATGGTA 
CTGGGGAAAG 
TGTAAAATTT 
GCAGCCATTT 
TAGACTGGTG 
CTCTCAACAT 
GCTTCCCAGG 
GTGGTATTCT 
TGACTGATCC 
TCCTCTACCC 
TTCAATAGTA 
ATGCATAATT 
ATATTTGTCC 
GGTGAAAGGT 
CAATGGGTTC 
CCCCATCTCT 
CATGATTGGA 
CAGCCTGCAC 
GGATTCCCTT 
AATAGCTCAA 
ACTGATTTGG 
GTTTTCGCAA 
TTATTGGGAA 
TCCCTCCCCA 
TTTTGAAGCA 
CGCTTGTTGG 
TGGCTCAGTG 
GAGTTTTTGA 
GGCTCTTTCC 
CCACGGTTCC 
CCTTCCTACT 
GATGAAACTT 
CACAGCACTC 
CACTACAAGA 
CCTGCACCTA 
TAATGCTGGA 
ATCTTCTTTT 
AACGTGCACT 
TTCAGGGCAC 
AAGAAAATGA 
AATTTTGAAA 
TGGGAGAAGG 
AATCAAATCC 



GATGGAATTT 
AACCTCCGCC 
GCAGCCATGC 
ATGTTGCCCA 
AAAGTGCTGG 
TATTTATACC 
GTAGTGTGGG 
CAGATTGATT 
ATTCAGAATT 
TCACAGAAAG 
TGGCTTGCAA 
AATGTTTCTT 
GTATAGAAAG 
TTCCCATTTA 
CAAAATATGT 
GCCTTATAAG 
GTTATGATGC 
TTCTAGAACA 
GTTATAGTAT 
CTCCTACATC 
AGTACAAATG 
TATTTTCGTT 
ATTAGTCAGA 
CCTCTAAATC 
TTTTGGATCG 
TCACTTTGAG 
CTCGCTCAGC 
AGTTTCCTGA 
AACCGTGAGC 
ATAGTAATGC 
T'CTGAAGTAC 
AGCGTGTTCA 
ACCACGAGGC 
ACAACTTATT 
ATATCTGCCT 
CTTACTGAAA 
TTTGCTTAAT 
GAGTATAGAT 
AGCTACCCTT 
AAAATGGTCT 
TGACATTTTC 
TGGCTGGCTT 
TCCATCCTTA 
ATCACTTATC 
CAAGTAGCAC 
GAACACTCTC 
TATACATCTC 
CCATGTGCTT 
GCTATTTAAT 
TATCTGAGAT 
GCCAGTGATT 
AGGGAAGTTG 
CAAAAAACAA 
ATAGTTCTGT 



CACTCTTGTT 
TCCTGGGTTC 
GCCACCACAC 
GGCTGGTCTC 
GATTACAGGT 
ACACACATTT 
AACCATAGTC 
AATAGGTGAA 
AATTACCTAA 
AATTGGGGCT 
AGGTGGCCTT 
TTATGATTTT 
GTGAGGAGGC 
AGGCAGCTTT 
CAAAGAAATA 
AAAAGGAAGA 
AGTAAGAAGG 
GTAGGAAATA 
CACAAAATGG 
ATACACATAC 
TACTACAAAT 
AACTTTTGTA 
TGTTTTACAT 
TCATGTTGAA 
TGAGGGTGAA 
TTCACAAGAG 
TCTCACCATA 
GGACTTGCCA 
CAAAAAAAAT 
AAGAACGAAC 
CCTTTTTCAA 
AGGGTGAATT 
ATAGATTGTC 
TTCCTCTTAT 
TTTGTATGTT 
GGATTGCCAT 
TCAAGGTAAC 
TACTGATATT 
AATCTTGGTT 
ACGATTTGTT 
TGGACTTCAA 
CTTCCTTGCC 
TTTCTATTCT 
TCTACATTTT 
CGTAAGGAAA 
TGGCACTTAG 
CCTCATGAAC 
TTGTATGATT 
CTTCATGTAC 
AACTTTTTAA 
ACTGATGTTT 
AATATTGTGA 
ACTAAAAATG 
GACAGCGTTG 



GTCCAGGCTG 
AAGGGATTCT 
CCGGCTAATT 
GAACTCCTGA 
GTGAGACCTC 
AATGCAATTA 
TCTTGGCCCA 
AAGGCATTTT 
CTTCCCAATG 
TAGATTCTGG 
GTTAGGTAGG 
TAAGTGTCAG 
ATGGCTGCAT 
GCAAGCCCAT 
TATTTTGGGG 
GACACCTGAG 
CCCTCACCAG 



ACTAAGTAAC 
ACAGGCCACA 
TATATATGTA 
GTCAAAATGT 
TCTTTCTTCA 
ATGTAATCTC 
CCCCTCATGA 
ATCTGGTTCT 
TGATATGCCT 
GTAGCAGATG 
TACTTTTCTT 
TAACACACTA 
CTTCACAGTA 
GTATTATGCA 
TTACTTTCTC 
ATTTATATGG 
TTTTGAAGGC 
CAAGTTGTTT 
TTGGATGAGA 
GTGACTGGAT 
TCAATTTTAT 
TAGGAAGTTA 
ACACATCCAG 
TTCAGGTCTG 
TTTTTCTTAT 
CATTATGTAT 
CAGGTTGTCT 
CAGGTTTTCA 
TCTCTAAATC 
TATTGCTCAA 
GTAAGTCCTC 
CATCTCCATC 
ACGGCTATTG 
AGGGAAAGAT 
AGCACTTTTA 
GCTTAGAAGC 



GGGTGCAGTG 
CCTGCCTCAG 
TTTGTATTTT 
CCTCAAGTGA 
GCGCCCAGCC 
TTGATATCTT 
CTAAATGTTT 
ACCTATTGTT 
AGTTATAGAT 
TAAAACAGGT 
TGAAGCCTCC 
ACTCTCAGTC 
TAATGGAGAT 
TTCTGCCTGC 
TAAAATATTT 
CTGACACACA 
ATACTAATTC 
CTTTAAAAGT 
TATATTATGA 
TTTGGAACAT 
TTTTTAAATT 
CATTATAACA 
TACTAAGTGA 
CAATGTTGGA 
AGCGCACTCT 
TTAAAAGAGT 
ACTCCCTCTT 
CCTGCACCAC 
TATAAATTAG 
AGTCTATTTC 
GCTACTTGTA 
ATTAACAGAT 
TGCTCCTGGT 
AATAAATAAC 
AAGTGCCTAG 
TGCTAATAGT 
AGAAGAGTTT 
GTACTCCTGC 
CTAGCCCTGT 
GAATAGCTGT 
CATTTTATCG 
AATTCAAATG 
CCCCTTTCTT 
TTACCTTATT 
GCTTTTTCAC 
GTAAATATAT 
CTTCTAATTT 
AATCTTTATT 
CCTTCTCTGA 
ATGAATCTTG 
TTGAGGGTGA 
AACACTAGAG 
GTCTCCTGAC 
AGATTTTTTT 



Figure 9 (Page 32 of 74) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



121/162 



103681 

103741 

103801 

103861 

103921 

103981 

104041 

104101 

104161 

104221 

104281 

104341 

104401 

104461 

104S21 

104581 

104641 

104701 

104761 

104821 

104881 

-104941 

105001 

105061 

105121 

105181 

105241 

105301 

105361 

105421 

105481 

105541 

105601 

105661 

105721 

105781 

105841 

105901 

105961 

106021 

106081 

106141 

106201 

106261 

106321 

106381 

106441 

106501 

106561 

106621 

106681 

106741 

106801 

106861 



CGGCTCACTG 
TAGCTGGGAT 
ACTGGGGTTT 
CGCCTTGGCC 
AGATTTTTTT 
CACAGATAGA 
TACTCCATCT 
AATCTGTCTT 
TCCCGGCCAA 
AACACAGATT 
TTTAGAATTA 
AGATAAACAG 
AGGCTTGTCT 
AGGACAAGAC 
AGAGTAATAT 
CCTACCTTGA 
GACAGGAAGG 
TTTCTGTGTT 
CAGTGTTGTG 
TTCTTTGAAC 
TAATCAAAAA 
TTCCCTGGGG 
GGTTAACACC 
TATTTTTCTC 
TTATGAATCA 
TCTTCCCTCC 
CGCCCACGCT 
CAAGCGATCC 
CCCGGCTTTT 
GGGTTTCTCC 
AAAGTGCTGG 
TATTTAGAAA 
GCACGAGCGG 
TATTTTGACA 
AGCCAGACTG 
ATGAAGGATG 
CTCACAGCCT 
ATTACATTTT 
AAACAAGGCG 
TTTCCTGTGG 
GCTGGCGCGC 
CTGGCGGGCA 
TTGGCCATCC 
GGTGGCGTTT 
AAGGCCAAGG 
AAAATCAGCC 
TGTTGTGCTT 
AGTGTGGCAC 
CTAGGTATGT 
AAGTTTCACA 
TCCTAACTAT 
CATACCATTT 
AACATTTGAG 



TTGAAATGGA 
CAACCTCTGT 
TACAGGCTCC 
CACCATGTTG 
TCCCAAAGTG 
ACACTCATGT 
AGTAGTAGAT 
GCTCCTATCT 
GATTTTAGGT 
GGAAAAACTT 
AACTGGAGAA 
AGACTGAAAG 
CTGTATAGGG 
GTCAAGATTC 
TCTCTTTTAG 
TTTTAGGTTT 
GGAGGAATTC 
CAGAAGGTGG 
ATGGAATGTT 
AAAAAGTTCA 
TGAGGGCACC 
TTTGAAAATT 
AATCTCATCA 
ATCTAAACAG 
CAAAATCATA 
AGAGAGCTTA 
CACTCCCCCT 
GGAGTGTGGT 
TCCCACCTCA 
TTTTTTTCTT 
ATGTTGTCCA 
TATTACGGGC 
TTGGTCGGAG 
CTGAAAGTCA 
AAATCCTAAT 
GGGATTGGGT 
CAGATTCTGA 
ACCTCCAGTC 
CTTGTGGCGA 
GTAAAGCTCG 
GCCGAGTGCA 
CGGTGTATCT 
ATGCGGCCCG 
GCAATGACGA 
TGCCTAATAT 
GAAAGTGAAG 
TAACAGCAAA 
TGGATTATGC 
TTTTAGTAAT 
GGGAGAAGTG 
CACAGCAGTT 
CTTGAATGGA 
GCTGTAGCAA 
TATGTATTTC 



GTTTCGCTCT 
CTCCAGGGTT 
CACAACCACG 
GCCAGGCTGG 
TTGGGATTAC 
TTCTTTTTCC 
ACCTCAGAAA 
CATGGAATAT 
TCCTCAACAG 
CCCCTTTGCC 
AAGGCATATA 
ATACAGGGGA 
TACGATCTAA 
TTCTTGACCT 
AATGGGGGGT 
TATGGCTGGT 
TGGTTTCTAT 
TCAGTGAAAC 
TGTTCTCTCA 
GGAAATGCAA 
TAGGAAACAG 
AAAAAAAAAT 
ACCAGAGAAG 
ACTTTGTCAC 
TACTCTCCCC 
TAAGCTTCTA 
CCCCTTTTTT 
GGCTCTATGT 
GCTTCTCGAG 
TTTCTCCCCC 
CGCTGGTCTC 
ATGAGCCACT 
TCCACTCCTT 
AAATAACCAG 
TCGGCCAATT 
CAAACATAAA 
TTTCCCATTG 
AGTATAAATA 
TTTTCCCTTC 
CGCCAAGGCT 
CCGCCTGCTC 
CGCGGCGGTG 
CGACAACAAG 
GGAGCTTAAT 
TCAGGCGGTG 
AGTTAACGCT 
GGCTCTTTTC 
CGCCCATAAA 
TTGTCCTGCA 
CCATGCAGCA 
ACTACATTTT 
AGTGTTAAAA 
TTAATGGCAT 
CCAAAATGAG 



TGCCCAGGCT 
CAAGCGATTC 
CCCAGCTAAT 
TTACGAACTC 
AGGCATCAGC 
TTCTGTCATC 
TTCCTGGAAT 
AAAAGGAAAA 
GAGAGCCAGA 
CTCCCAAGGT 
TATTTATTTC 
AATTGCCCAT 
TGCTAACAGA 
CTCAGTGCAG 
CTTATGACCT 
TCTAGGGAAA 
GGCTAGACTT 
ACTTTTATAA 
TTTCCTGAAA 
CTCAAAAATG 
TAAATTCAAG 
TCAAAAAGGA 
ATTAACTGTA 
AGCTGTCACC 
TAAGTTGCCT 
CAGTTCACTG 
TGTCTTTGAG 
GAACTCACTG 
TAACTGGAAC 
GTTTCTTTTT 
GAACGCCTGA 
GCGCCCGATT 
TCCAAAAACA 
AACAAAACCT 
ATTATTAGTA 
CCTTACACCA 
GGTATTTGAC 
CTTCTCTGCC 
TTATCAGAAG 
AAGACTCGGT 
CGCAAAGGCA 
CTTGAGTACC 
AAGACCCGCA 
AAACTTTTGG 
CTGCTGCCTA 
TCATGCACTG 
AGAGCCACCT 
GATGTTTTTG 
GAAATTAGAT 
CAAAACATGT 
AGAGGAAGGA 
CCCGCATGCC 
ACACAATTGA 
CTTTTTTCCA 



GGAGTGCAGT 
TCCTGCTTCA 
TTTTTGTATT 
CTGTTCTCAA 
CACCGTGCCC 
CTGTTTCAGT 
AATTAATCCA 
ACACCAAGAT 
CAATGGCTGT 
TTATGGAAAA 
ATCACAATTT 
TTTTATGCTT 
CTGAGTGGGG 
CATTTCTTCC 
ACAGGCAAAC 
AGGAGTTCTG 
TGGGGAGAAT 
TCATAATCCC 
GATTCCAGAG 
TGCCACTTTG 
GAAGGGCTTT 
ATTTAGTTGT 
TCACAGGAGA 
TATTCTTTGA 
ACATCCCCCT 
GGATTTGGGG 
ACACAGTCTT 
CAACCTCCTC 
TACAGGCGTG 
TGGTTATTTT 
CCCGCCGTCC 
TGAAGGACCT 
TGAGTCACAA 
CCACTCATGC 
TTCAAGTCGA 
GACGGAAGGA 
ATTAGCCAAT 
TTGCGTTCTA 
TAGTTATGTC 
CTTCTCGTGC 
ACTACTCCGA 
TGACCGCCGA 
TCATCCCGCG 
GGCGTGTGAC 
AGAAAACTGA 
CTGTTTTTCT 
ACGACTTCCA 
AGGTGTTTTT 
CCATAGAAAC 
TTACAGGGGT 
AATTATACCC 
CCACACAAGT 
GAGCACACAC 
GTTTGGGGAT 



GGCACGATCT 
GCCTATGGAG 
TTTAGTGAAG 
GTGATCTGCC 
AGCCAGGAGC 
ATAAGCAGAC 
CGTTCATCTG 
TTCCCTAGGC 
AATAATATTG 
TTACTGGCAA 
TACAGGAGAT 
AGGTTCAACA 
AAGCCCCGCA 
TTCTGGTTAT 
AAGGTAGGTT 
GTTTGTATGG 
GGGACTTACA 
ATTTTGAGTA 
ACTCCTCATT 
TTACGCTGAT 
CGCTGAACTC 
TAAGATTCAC 
GGAGACTGGT 
AACACCCATT 
TCTTTCTCCC 
TATTCGCTTT 
CTGGCTCTGT 
CTCTCGGGTT 
CACTACCAAG 
ACTGGAGACA 
TCGGCCTCCC 
CTTAAATATC 
TCCGGGAAAA 
TTAAAAAAGG 
AGGCTCGTCA 
TTACATGCAA 
GGGAGAATTC 
ATGTAGTTTC 
TGGTCGCGGC 
AGGTTTGCAG 
GCGCGTCGGG 
GATCCTGGAG 
CCACCTGCAA 
CATCGCGCAG 
GAGCCATCAT 
GTCAGCAGAC 
TTAAATGAGC 
AATGG CTTTG 
CTCAGGAATT 
GATTCGCGTT 
ATGAGTGCAT 
TTGAATATGT 
ATTACCACTG 
GTTTTGCTTT 
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GTTTTGGGGT 
CTGTAACCTC 
ACAGGCGAGA 
CGACACCCAG 
CAATATTCTC 
CTTTGTGTGG 
AGTGGATATA 
GGAGCCTCTC 
ATTTTTGAAT 
CTTTCCTTTC 
GTACTAAACA 
CCAAGGCTGC 
GCCCGGGCAC 
TTCGGAAGCT 
TTCGCTTCCA 
GGCTCTTTGA 
AAGACATCCA 
TCATCAGTCT 
GCTGTGATAA 
TTAGACTATG 
GGTTATTTCT 
TCAGGCCTCT 
AAATCTGGTA 
AGAAAATTGT 
GCTCAATACA 
ATATATATAT 
GGTGCAAAAT 
CTTTAAATCT 
ATGTGTGGCT 
ATGCACCATG 
TAGAATTTAC 
TGGGGATTGA 
AATATTTACT 
TTTTACAGAC 
AGACCTTGCC 
GCCTACATCT 
ACAACAAATC 
GTGTTCCAGC 
TTATGAAAAT 
GACTCTAACT 
ATGGACTTGG 
TATTTTTTTT 
TTCTTTATCT 
AATAAAGTTA 
ATTTTCCACT 
TAACTCTTCA 
AAAGAAAATT 
AAGAATCTAT 
TGTTGTTTGC 
CTAAAGGCAC 
AACAGGAGAT 
TTCTTCAGCC 
CGAGAGACTT 
TCATATGGTT 



GGAGTCTCCC 
GAACTCGGGC 
GCCGCCACGC 
AAAAATACAA 
TGATTTCTTT 
TTGTAAATTT 
GCAGCTAAGG 
TTAATCTGCA 
TTTCTTGGGT 
CTCCACAGAC 
GACAGCTCGG 
TCGCAAGAGC 
TGTGGCTCTG 
GCCGTTCCAG 
GAGCTCTGCG 
GGACACAAAC 
GCTCGCTCGC 
TAAAACCCAA 
TTTTTTGTTG 
GTCTTAAAGT 
GACCTTATTA 
AGCTTGCTAT 
AGTAGTTAAC 
GTCTTGCGAG 
TAGTCCCCTA 
ATATACTGTT 
GTGAGGCAGG 
GTCAGTCTGT 
TTGCTTGTAA 
ACATGCCACA 
AAGTTTTAAC 
GTACTGGAAG 
TTAAAATTTT 
TAACTTTAGA 
TCACATTCTT 
AGAATGTAAA 
ACACACACAA 
TTTTAATAAG 
GAATATGTCA 
GGCATAGACA 
TCTATGCCAA 
CCAGTTATAG 
GTAGGAAACA 
CATTACTGTC 
CCCTCACTTA 
TTGACAGTTG 
TATTGAGCAT 
TGTTTTGTAT 
AGAATATACC 
TTCAAAAACA 
AAAAGTTCCA 
CAGAGGCATA 
CTATACACAA 
TACTTTCCCA 



TCTCGCCCAA 

TCAAGCGATC 

CCGGCTAAGA 

TTTTAAATAA 

TTTATATTTT 

TAAGACTTCA 

GGTTAACAAA 

ACCAGGCACA 

CCAATAGTTG 

GTCTCTGCAG 

AAATCCACCG 

GCGCCGGCTA 

CGCGAGATCC 

CGCCTGGTGC 

GTGATGGCGC 

CTTTGCGCCA 

CGCATTCGCG 

AGGCTCTTTT 

TCTTAACAGA 

TGATTAACAG 

AGGTGCTATT 

GATTAGCATT 

TGGCGCTTAC 

TTCCAGTGTC 

GGTTTTCTCA 

AAATTCATTT 

GATCTAACTG 

CGACCAAGCA 

ATAGTCTATC 

TTCTTTTTTT 

CATTTTCTTT 

AAAATTTAGA 

TATATTTTGT 

ACAACCACAG 

TTTTACAATA 

CTGATGTACC 

AAGATCAAAT 

GCAGTTTTTG 

GTTTGTTTTA 

TTTGTTATCC 

GGTGACTACT 

ATGTGCTGGA 

AATGTGTTGG 

TGAGGATCAG 

CATTCTTTGC 

ATATTTAAAA 

TTTGTATTTG 

TAGAGGAGTA 

ATCCAAAAAT 

GCATTCAAGA 

ATGTGAAAAA 

GATGAGATAA 

ACAAACCTTG 

CAATTGCCTC 



GCTGGAGTGC 
CTCTTGACAG 
GCATTTTTCT 
AGCGCATATG 
AACTAGAAAC 
GGAAACTTTT 
ATGACGTCAG 
GAGATGGACC 
GTGGTCTGAC 
GCAAGCTTTT 
GCGGTAAAGC 
CCGGCGGCGT 
GCCGCTACCA 
GAGAAATCGC 
TGCAGGAGGC 
TCCATGCTAA 
GAGAAAGAGC 
CAGAGCCACC 
ACAAATTTCT 
AAATAACGGT 
TGGAGAGAAG 
TGTTTAAACA 
TAGGCATTTT 
TTCCTCAAAA 
TATATTATAT 
GGCTGTTAAC 
GCTCTCATTT 
TAATTTAATC 
TGGTTGCATT 
CAGTACTTCT 
CTGTTGATCT 
GGGATGGGAA 
ATTTTTTTAT 
AATGTCCAAC 
AATATTTTTT 
ATACTAAAAT 
TTGAATTGCA 
GTTTATAAAG 
TGATTCGTTT 
ACAGACAGTA 
CACAAGCTCT 
TCTGATGTAT 
AGGTACTGGG 
ATGGACAGGG 
CCCCTCCTCA 
ATTAACGAAT 
TGAGTAGTGC 
ATTAAGGAGA 
AGACCACTGT 
AGGGAATTCT 
TGCTCTGCTT 
TTCTGCACAA 
TTAAAATAAT 
TCTTTAACTT 



AGCGGCGTGA 

CCTTCTGAGT 

AATTGCCCAC 

CAAATTTCCC 

AATTGGAGGT 

CCAGTACAAG 

AGTAGCTACG 

AATCCAAGAA 

TCTATAAAAG 

CTGTGGTTTT 

GCCACGCAAG 

GAAAAAGCCT 

AAAGTCGACC 

CCAAGACTTC 

TTGTGAGGCC 

GCGAGTGACT 

GTAAATGTAA 

CACTTATTCC 

AAGGACCCCC 

TTGGTCAGTC 

CTGTGTAAGT 

ACTTTGTAAG 

TGCAAAGCTT 

TGCTTAGGAA 

ATATATATAT 

ATTAACCTGA 

TATCCATAGC 

CCTTATATAT 

GCTTTGTCTC 

TGCCTGTAGT 

TGCTTTTCGG 

TACTGTACGC 

CATATAGCTT 

ATTAAAACTA 

ACACCTAACA 

CGCCTGACCA 

TCGTTTACTT 

TAATATTTGC 

TTCTTGACTC 

TAGATATGTT 

GGGCCCAGCT 

AGCGCTTGAC 

TCTGACGAAT 

GGTGGTAGCT 

ACAGAACAAG 

GGATGAAATT 

AAACATTTTA 

GATTGGAGAC 

GGGATCAGGA 

TCTAAACTTT 

GTACCAGGTG 

ACACAGCAGG 

CATATATTCC 

AATGTGAAAG 



TAACAGCTCA 

AGCTGGGATT 

ACTTCTTATG 

TAATCGTCTC 

TTCCGCGTTG 

ACTTGTCCAC 

GTAATGGGCA 

GGGCGCGGGG 

AAGAGTAGCT 

GCCATGGCTC 

CAGCTGGCTA 

CACCGTTACC 

GAGTTGCTGA 

AAGACCGATC 

TACTTGGTAG 

ATTATGCCCA 

AGTTACTTTT 

AACGAAAGTA 

CCGGAAAGCA 

TTGCAGTGTA 

CCACTATCAT 

AGTAAGGGAA 

TGAAAAGATT 

GATTTTCTCA 

ATATATATAT 

AATTTATTCT 

TAGCTACCCA 

GAATTTTTAT 

CTCTAGGACT 

TATTAAAATC 

TTTTGGAGGT 

AAACAAAAGT 

TTACATCACA 

CTAATTCCAA 

TTCTTTCTTG 

ACTGTCAACA 

AAATTCATTT 

ATTTTAAAAA 

TTATACAAGC 

AGAGATGCCA 

GAAGGTCAAG 

TTTTTATATT 

AGCATAAAAG 

CAGTCCAGCT 

GATTCTGCTG 

CTCATTTGTG 

ATATTATATT 

AAAAAGGGGG 

TTCTTTTGAG 

TCTTTCTGAA 

AAAAGACATA 

GAGTCATAGC 

TTTAATCTCC 

CATTTAGCTT 
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AGTAGGACTG 
AAGCAGGCTG 
TAAATAAGGA 



TTTGGGGCTT 
TTGTATGCTT 
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AATGTTGTTT 
TATCCCCTGT 
AGAAGTGAAA 
CCTGGCTCAT 
AGCCAGAGAA 
GGACCCACCC 
GAATAAACAG 
CTGAAACAGT 
GCTGAAGTTT 
GTGAGATGGC 
AAAATTTAGC 
AAATTTAAAA 
ATAGGAAAGC 
TTCGGACAAT 
CGTTATTACA 
TAGCAATATA 
CTGAGCAGAT 
TCTTTTCCAC 
ATCATCATAA 
AATGAGACAG 
CTGGACCAGC 
TTTAACACAG 
CAGTTAGCTA 
AAGGGATTTA 
GGTTTGTACA 
TAAACTACAG 
TCAGCTTTCA 
TTTGAATAGT 
TATTAGTGAA 
AGATTCAAAT 
TATTGTGTTC 
AGGCAGCAAC 
GTGTATAAAT 
TAGACATGTC 
ACCGAAAAGT 
CTAGGCGTGG 
TCAAGGTGTT 
GCAAGACTGT 
ACGGCTTCGG 
CACTCCCTCT 
TACTCGGCTA 
GTTTGGTCTG 
GTGAGCTATA 
AGTCCGATAA 
CGAGAGCCTA 
CTGTTTTCCC 
TAACACCTGC 
CTAGTTGAGC 
GCGGTTAAAG 
CCTACGCAGT 
CTCAAATTTC 



123/162 

CACTTTTTTA 
TCATTCTGCT 
GAGTAGAGGT 
GTTCCCAGTT 
GAACTGCACG 
ATGCCCTGTT 
CCTGACTCAA 
CAACCCCCTT 
CATCTCCCTT 
CCTTGTTGCT 
TTAACAGGGT 
ACAGCTAATG 
TAAGATTCTC 
AGTAGCCCTT 
CAACAGTGAG 
ACGAATGCTG 
TTTAACATGT 
GGCTGTGATT 
GTGTTGGGAG 
TTTTTTAAGT 
AGTCTCTTCT 
CCCGATTTAC 
TACAGTGTGA 
AAATTAAATT 
GTTTTACGTG 
CTTATATAAT 
GGCTAGAATA 
GAATTCTCTT 
TGAGAACATC 
AGAAAACATA 
CAATATGTCT 
AGCCCTTAAT 
AGTAGTGTCG 
TCCCTCGCAG 
GCCTCCTTAT 
GCTCGTGGCT 
TGGTCGCGGC 
GCTGCGGGAT 
TGGGGTTAAG 
TCTGGAGAAC 
CACTG CCATG 
CGGTTAATCT 
CAGAAAGAGC 
TTCTGCCTAG 
AGTGGCTGCT 
AACTTCAATG 
ATGAGTAGCT 
AGATGCTAGC 
AGGCGTATCT 
CGTCCTTAAC 
TTGGCGTCCT 
CTGCTTTGCT 
TCGTAAATTC 
TTCTGACTCC 



TGAGGGTTCT 
AATCTGTTTT 
AAAATTTTGC 
GTTCAGTTTG 
TATGCCTCTA 
CCTGCCTTAA 
AAGCTCCCCC 
TGACTGTAAT 
CGCTGACTCT 
CACACAAACC 
TTTTCCTGCC 
CTGTTGAAGT 
AGAGAAAGAA 
GCAGTTTTTC 
AAGCGTGTAT 
CTATTGTATT 
CACATATGAA 
TTAAAAAAAC 
TAAAAACACG 
AGTAACATCT 
CATGCCTCGT_ 
AGATGAAGGC 
AGGATAGCAA 
AATGTAAAAT 
ACACAGGTGC 
GAATTGGACA 
TATAACTGTG 
GGCCTCAGCC 
TTTCATATGA 
AGGCCAGAGT 
TCAAGCACTT 
ATAAGCCCTT 
TAAACGGGAG 
TCCTTAGGTC 
CCTCGCTCCC 
TGCTTTCTTT 
AAAGGCGGTA 
AACATCCAAG 
CGAATTTCCG 
GTGATCCGGG 
GATGTGGTTT 
TTTCGTCAGT 
TGTGATTGTA 
TATGTAGAAC 
AAAGCAGAAA 
CTATAGTTTT 
CAGCTTTTTA 
TGCCTGGAAC 
GACTTAACGT 
CGCCCCCTGC 
GCTGAGTGAC 
ATTTTCAGTC 
CCACTTAGTA 
GAGGTCCGTG 



CCTGTCCCAT 
ATGGCAAATG 
CTCCCTACAA 
TCAGGCCTCT 
GATGGCCTGA 
CTGATGACAT 
ACTGAGCACC 
TTTCCACTAT 
TTTCGGACTC 
CTGTTTGATG 
CAGTCACAAC 
CTAAAATCAG 
GTCAAGTTTG 
CAATAGAAGT 
GGAGAGTTGA 
GCACCTTGGA 
AAGCTAAACG 
AATCCTTACT 
AAAATGAGAG 
AAAATTAAAC 
TCACATTAGC 
ACGGTTGCAA 
AACTCCACTC 
GGATTAACAG 
TCTCATAAGG 
ATTAGTAAAA 
TAGAGAAGCG 
TCCTATCCTT 
GAATTTCACC 
GATCTTTTCA 
GAAAGACTTA 
ATTAAAATTC 
GGAAAAACTA 
ACTGCCCCTC 
GCTTTCAGTT 
TCGCGTACCT 
AAGGTTTGGG 
GCATCACCAA 
GTTTGATTTA 
ACGCCGTGAC 
ACGCGCTCAA 
TTTCTTCCAA 
TTCTTTCGGA 
TATTATAAAC 
TCAGCTAAGT 
GACATGTCAA 
GTTTTAAAAA 
TGAGTAGGTG 
CAGCAAAAGC 
CGGTAGCGCC 
GTCACCTCCC 
CTCAGGCTGG 
GACTAAGGGA 
GCAGCAGCTA 



AAAATTTACA 
AATTATCAGG 
GATAGAGATT 
GAGCCGAAGC 
AGTAACTGAA 
TACCTTGTGA 
TTGTGACCCC 
CTACCCAAAT 
AGCCCGCCTG 
GTCTCTTCAC 
AAAGTGATGT 
TTTTGGTTTG 
GGGTGCATTT 
GATTTACGAA 
ACTACACTCC 
AAAGAGAACA 
GAATCTGTCA 
AATACATACA 
TTCAGGACAA 
CATATTATGT 
TAATTAAAAG 
TGAGCTATCA 
CCATCCTCTT 
GAGAAAGGTA 
TAATGAAAGC 
TGTAAAAATG 
CCCAGCAAGG 
GAGAAGAATG 
TACTGCTTCT 
CGCCTGCTCT 
AAAAGTTTAC 
TCAGTCGAGG 
AAGGGATTAA 
GAGGGGCGGA 
CTCAATAAGG 
GGTTTTTGTT 
TAAGGGAGGT 
ACCGGCCATT 
TGAGGAGACT 
CTACACGGAG 
G CGTCAAGG A 
TGGCCCTTTT 
TGGTAACATC 
CAGTTGGGAG 
AAACGAGGTC 
GCAACTTAAC 
CGAGTTGTGC 
GATTAAAATG 
TGTACTTTTA 
AGAAGCCTTT 
CCTTCTGTGG 
AGGCTCCCCT 
GTCTGTTTTA 
TAAGATGGAA 
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AGTCTTCTAT 
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AAAGCTGTAC 

CTTGGCTAAC 
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GTGATAGTGG 

TCTATCAGTT 

TAGGCCCGGC 

GATCACCTGG 

CTAAAAAAAA 

GGAGGGTGAG 

CGCGCTATTA 

GCAATTATAA 

ATGTATCTAA 
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CTAGGAGCAA 

TGTTTTCACT 

CTTTTTGAAG 

TCATAATAAA 

TTGCCTGACT 

TCCAAGCAAA 

CCTCATTCTC 

CATTTCCAGC 

TATCTGAATG 

AAGAGACCTC 

TATATAGTTG 

ATGTAATAAT 

TCTCTCTCCC 

TATCCCTAGC 

TGAAAGGATG 

GCTAACACAA 

TGCAGTTTCT 

TCTTTGTTCT 

GGGTATGCCT 

AGGCCAGGCA 

CATATGGGTA 

ATTAAAATTT 

ATAAATATCT 

CTCATTTTTG 

TGTCAGTTTT 

AAAACTGTCA 

ATCGTGAGGC 

TTGTGAATAT 

ATCCCTTTTC 

AGGAACATCC 



ATGTAAGATT 

TCCTGCCTTA 

CCTAGGAAGT 

TTTGATTTAG 

TGGTTACTTT 

AGTGAGTTCG 

ATAAATGCTG 

TTGTAATAGT 

TACATCCATA 

TCCACTACAT 

TCAACTGGCC 

CCACGGCCTA 

CAGCTTCGGG 

TGCTTCCACC 

AAGGTGGCTC 

GGTCAGGGGT 

AAAAAAATTA 

ACAGGAGAAT 

CACTTAGGCC 
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TTGAATAATG 

TCTTCCTCTG 

AGGAATTGTG 

TCTGTTGAAA 
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CTAGGATGAG 

TTCGCAGTTC 
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CCATAATCAT 
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ACCTTGAACA 

TGTGAATTTT 
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TTACTCAGAT 

CAGCTGAGTT 

TTTAAAACTA 



CTCAGATGAC 

AATTGTAAAT 

GTCAAAGTTA 

TAACAAATAT 
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GGTGAGAGGT 

AGACCAGATG 
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CTGTTTCTCA 

CACTTCAAGC 

ACGCCTGTAA 

TCGAGACCAG 

GCTGGGCATG 

AGCTTGAACT 

TGGGAGACAA 

TGACCTTAAA 
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CGGTTTGTCT 

CTGTCATACA 

CACTGTCTGC 
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TTATATCTAT 
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ATAGCATATA 
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TCTGGGGGAT 
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GGAGCTAGTC 
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GTACCCCAAT 

AACTAAGAAA 

TCCTGAAACT 

AAGCTGGGTA 

AGGACCTCAA 

CCTATAACAA 

GAGAGTAAGA 

CTGAAGCTAG 

TGGCTACAAT 



TTGCATCTTC 
TCCAAAACTG 
GGTGACCAGA 
ATTGATGGCT 
TTCTCCTGAC 
CTGAGCTGGA 
AGATGGCTAA 
CAACTGTAAA 
TTTCTCTACA 
ACTCTGGCTC 
CATGCTCTGT 
CCACGTTAAC 
TTCCCGGGAT 
TTCAGAACGA 
TCCCGGCACT 
CCTGGCCAAT 
GTTGCGGGCG 
CGGGAGGCAG 
GAGTGAAACT 
TCTCTAGACT 
ACTTGTCCAA 
TGTCAATCAA 
TGTCCTGAGC 
ATACCTTTCA 
GAATGACCAC 
GTTTTTCTCT 
TGAAAAGATT 
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TAAGACTTAC 
TACTAGATCT 
CTGACCTTGC 
TCTTTCTCCC 
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ATAAGCTTTT 
TATATAACAT 
ATAACTAGAT 
GAAGGTAGGG 
TTAACAGGTA 
CTCCAGGCTC 
TTGACCTGCC 
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GCAATGCTAT 
GTGGCTTATT 
CTACATTGAC 
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GACATAAAAT 
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CTTGTGTAAT 
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TGGTACCAAA 
TATTTCCCTA 
ATGTACTTAA 
GGCTTGACTG 



ACTGTACCTG 
ATTTAATTGT 
TTTTTAGAAG 
ACTTCAGCAA 
AGGAGGATAT 
GATAAAAATG 
AAACTGAAAC 
GTTTTCATCA 
CAGCCCTACT 
AGTTCTTGGA 
GCTCTGTCAA 
TTCTAGCAAT 
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AACATAGAGC 
TTGGAAAGCT 
ATTGTGAAAC 
ACTGTAATCC 
AAGTTGCAGT 
GTGTCTCTAA 
CATATACAAC 
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TTCTTACAAC 
TTGTTTTCTT 
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CTGCTCATTT 
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ACTTTGTTTT 
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ACTGCCTGAA 
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TCTGGAATTC 
ATATCTGGCA 
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CAGCTATGAA 
TCCTCAAGAA 
CATCCATCAC 
AAGAGCACAA 
TCAGCTAACC 
CTGGAACACA 
GACAAACCCC 



TCAACCCAAT 

GAAAGTTTCA 

TCAGCCAAAT 

AAAAAATCAA 

AGTGAATAGG 

TGTGAGTCAT 

ATAATGTAGT 

GAAAGGACTA 

AAAGAATGAG 

CTCCTCTTTT 

ATAGTTTGTT 

GCCAAAGCCT 

CTCTCCAAAT 

TTAAGAAATA 

GAGCCTGGTG 

CCCGTCTCTA 

AAGCTACTCG 

GAGTTGAGAT 

ATAAGTGTTT 

TGCATATTTG 

TACGTAAACA 

TTCTTCAAGC 

TTTCACCCAA 

TGCTGCTTCT 

AACTGGTCTT 

AACAGTGATC 

TCCATCACCT 

CCATTACAGT 

CTTAGAAAGT 

TACCCTCTGA 

ATGGACCAAG 

TTGTCCTGGA 

AACCTTAAGA 

AGCATATATA^ 

AGAATTATAT 

TTGTGTTTGT 

GTCCCAGAAG 

GTTTGTTGAA 

CCACCAGAAT 

GCAATCAGCG 

AGCAACATTA 

AGGTGGTGGT 

ATGCTATGAA 

CTATACCTCC 

AGGAAAGGGA 

TATGAGATCA 

CCATACCAGG 

TTCTCTCCAT 

ATCCTGGATG 

TGAAGCTAAC 

TGAGAGGCAT 

TGAAGTTCAC 

TAACTGCATC 

AGGCTTCCAG 
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GTTTAGCACA 
CCACCCTGCT 
TCAGCTTTAT 
ATTTTACCGT 
ATCACCCAAA 
GCGGTAACAC 
TAGTGTCCTG 
GAGTAAAACT 
ACTTAAATGT 
AAACAGGGTT 
GCCTCAGGAA 
GATTGGGCAG 
TCCACTCTTT 
GGCAAACTTG 
TTAATTACAC 
GGCCCTGAAA 
CGCCAAAGCC 
TGACTGTCTT 
AAAACACCTT 
CGCCACGCCG 
ACACCTTGCG 
CAGACATGAC 
ATATATCTAC 
GCGGGAAATG 
GCGGGAGCAG 
ATGGCCTTTA 
TAAACCCATT 
TGTCGCCCAG 
GTTCAAGTGT 
TCGCGCCCGG 
TGATCCCGAA 
TACAGGCGTG 
TAAAACGAAA 
TTTACTTGAA 
AGTAATCACG 
GTGACGCGCT 
GCCTCGCACG 
AAGTCCTGCG 
GTGGACTTCT 
GGCTTTTTCA 
TTGCGTGGCG 
GAAAAACAGC 
TATAGTGTGT 
CTGATTGGTC 
TTCCAACTCT 
TTTTGCTTTG 
ATCCAGAGTA 
AAACTTGTCC 
AACAAGGCAT 
TAGGAAAAAA 
ACCCTTTACG 
TTGACTTAGG 
TACTCAGAAA 
TTGCAGCGTT 



GGTGGCCCTT 

TGCATCATTT 

TGATATTTAA 

GTTTTCTTAG 

TTTCCATTTC 

TGAAAAAGGT 

GGTATTCCAG 

GTATTGGTGG 

CTCTGTGATT 

TTTGTTTTTT 

ATATTAGCTC 

CGCTTCTTTG 

CTTCAGAGTT 

GAGTTCCCCT 

ATTGAGCTTC 

AGGGCCTTTG 

ATAAAGGGTG 

GCGCTTGGCG 

GAGCACCCCG 

GGCCAGACGC 

GTGGCGCTTG 

TTCCCAAGAA 

GTTACCCCTG 

TGACGCCTAC 

CGATTGGGGG 

TTTTCTTAAC 

CTGACTCCAG 

ACTGGAGTAC 

TTCTCCTGCC 

CGTGTTTTTG 

CTCCTGATTT 

AGTCACCGCG 

AGTGCTCCCA 

AAGGTGGTGG 

CCCTCTCTCC 

TGGCGTGGAT 

CCTCCTGCAG 

CAATCTCGCG 

GATAACGGCG 

CGCCGCCGGT 

CCTTGCCACC 

ACAGCGGAAC 

AAAGTGCAGT 

TATCTTTAAG 

ACAGATGATT 

TTCCCCAAGC 

GCTGGGATTA 

TCTTCTACAT 

TGATTCCAAA 

AAAAAAAAAA 

GGAATTTCTG 

AGTGTTATTG 

CATTTTCTAT 

CTGCAGCTTT 



CACAGACCAA 

CTCTCTCTGC 

TATACCACAA 

TTGTACAACC 

TGCGTAAAGG 

GCCTTTTCTC 

GAGTCTGAAT 

CGATAAATTT 

TTATTTCATA 

CTCAATAAGC 

ATCAGTTCTG 

TCCCTTGGAA 

GGAATATCGT 

AATCTTTCCT 

TTGACTTAAT 

GTTCAGAAAT 

CGTCCCTGGC 

TGCTCCGTAT 

CGAGTCTCCT 

CGGATGGCCG 

GCACCCCCCT 

GTGAACCAAG 

CCCCCACCTC 

AGTCCGCTCC 

AGGGTGGGGA 

AGAGCTACAG 

AATTATTTTA 

ATTAGAGCCA 

TCAGCCTTCA 

TATTTTTCGT 

CTGGTAATCC 

ACCGGCCGAA 

ATGCATTCCC 

CTCTGAAAAG 

GCGGATGCGG 

GGCGCACAGG 

AGCCATCACA 

CACCAGGCGC 

GATCTCGCGC 

GGCCGGAGCG 

AGTAGACTTC 

ACCCAACACT 

GATTGGATGA 

CCAGCAACAA 

ATTTAAGTGG 

TGGTCTTAAA 

CAGGGGAGCC 

CTGGTTTTCA 

GGTATTATAA 

AAAAAAGAGG 

AAACCTTTCA 

AAATCTACAA 

GAGACGTCTT 

TGTTTTCTAA 



CATTGCCTAT 
ATATATAAAA 
AATTTGCCCA 
ATCATCACAA 
GGGAAAAAAA 
TCTAAAACAG 
AGGGTTTCAA 
AGTATTGCTC 
ATCGCTAAAA 
TTCTTAGCTT 
ATTGGTTGAC 
ACTAATACAA 
TGCTCCCCTA 
TTTTAGGATG 
GGATACAGCT 
GCAAGCTGTG 
GCTTAAGCGC 
AGGTGACAGC 
CGTAGATCAG 
GCTTGGTGAT 
TACCCAAACC 
AGCAAGTGAG 
CAGCGGACAC 
TTTAACCCCT 
GATGAGGGTG 
GCTTTGAGGA 
AGTCGAACTT 
TCTCGATTCA 
GAGTGTACCT 
AGAGACGGGA 
GCCCGCCTCA 
ATCGATTGGT 
TTTTGTCTTA 
AGCCTTTGCT 
CGGGCGAGCT 
TTAGTGTCCT 
GCGGAGCTCT 
TGGAAAGGTA 
AGAGCCACGG 
CTTTTGCGGG 
CGAGCAGTTT 
AGCGCAAATA 
TAGAAGACGC 
TCGTGCAGTT 
TATTTTATTA 
CTTGGGCTCA 
CCACTGCGCC 
TAACCTGAAG 
TTCCCCAATT 
GAATACTGCT 
CAAGAATTGG 
AGCATCTCAA 
TCTCTTGATT 
AGCCTAGGTG 



GCTACCAACC 
ATATATGTGT 
CTTTAGGTAC 
TTTAATTTCG 
AAGGTTAACT 
ATTTTAATCT 
TTTTCAGGGT 
TCAGTACATG 
GATGGTTTTT 
CCCCTCCGGC 
AGCTACGAAT 
ATTTTTAACA 
CCCATATGTA 
TCAGCTCAGT 
CTTCTTTTGT 
GAGAAATCAG 
GTAGACCACG 
GTCACGGATC 
ACCAGAGATC 
GCCCTGGATG 
CTTCCCGCCC 
AGAATAGGAA 
AGAGACTGAA 
CCTCCAAGCC 
GGACCAAGCA 
ACTGGGTTAA 
TTTTTTTAAC 
CTGAAACCTC 
GGGATTACAA 
TTCGGCCATG 
GCCTCTTAAA 
TTTGAAGCCT 
AATTGGTTTC 
TGGACCGTCA 
GGATGTCCTT 
CAAATAGCCC 
GGAAACGCAG 
GTTTACGAAT 
TGCCCGGCCG 
CTGCCTTAGT 
GCTTAGTGCG 
CGCCCATGAG 
TAAATATGAC 
TCACCGGCTA 
CTACTATTAT 
AAAGATCTTC 
GGCTTGGACT 
GCTGTGTTTA 
CCGTATAACC 
CACCTCCTCT 
ATTCCTTTGT 
ACATAGTAGG 
ATGCTCTTTG 
TACTCTGCCA 



TCATGTCCTA 

ATGTATATAA 

AGTTCAATGA 
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GCTGAAGGCC 

CCCCTGAATT 

CTTTTTAATA 

ATTGAGGGAT 

TTTTTTCCTA 

TCCCTGGCTT 

GGCCCTCATT 

CTACTTTTTT 

GTGAGTGGAG 

ATCATTCATC 

TTAGTTGGGC 

CAACCTTAAC 

TCCATGGCAG 

ACGTTCTCCA 

CGCTTCACAC 

TTGTCACGCA 

TTACCACGTC 

ACCGATCTTT 

AAGCGCGCAG 

CCAGGAAATG 

GGCTTGACCA 

GAATTAAATG 

CGAATCTCTC 

TGCCTCTCAG 

GCGCTCGCCG 

TTGGCCAGGC 

GTGCTTGAAT 

TCAGTAGCAT 

TTACAGCTAC 

GAGAGACCAC 

GGGCATGATA 

TACCAAGTAG 

GTCTGTTTTA 

AAGCAGTTCA 

GTAGCGGTGG 

GGCCAACTGT 

AGCCATGACG 

CTGCTCTATT 

GTTACACACT 

CTATATTCTA 

TTTATTTTAC 

CCGCCTCAGC 

TTAATTTTTT 

TTTTCCATAA 

TTCAGCTCTT 

CCGGAAATGT 

AATGCTTTAA 

ATTACACTAT 

AATCCTAAAC 

GTCACAAAAT 
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119881 GGCGTTTCTC CAGCACTGCC GCCAGGTACC ACCAGCTGGG AGTTGTTCCT CTTGCGGAGC 

119941 AGGAGGTGGA CTTGGCCCAA GAGAAACTGG ATAGTGGTTC GCAAGGAACA TAATTTAGCA 

120001 TTGCCAAGAG CTAATGCAAT CATTTTGAAA ATCTCAAAAC ACTGAAAAGT GGATTGTGAC- 

120061 CTTTTTAAAT TCACAAGAGA CAGGCCACAT TCTATCTTTT GATTGGTTTA GGCTATTTTC 

120121 TTGAACAGCC ATTTAGAAAG CAGATCTATC ATCCTTCATT TGCATGGAGC GTTCCCATTT 

12 01B1 TATTTGAAAC CAGTTTAACC CAATAGAAAA AAGGGAGGCA GAACCCATTA TTTAAAGTGG 

120241 AAACTCCTGA ATCAGATAAT TAGGAGTATT TCCTTTTCAA AAGTTGCGTT TTTTCAGATA 

12 03 01 CCTCGCTTAT TACACTAAGA AAGGTTTATA TCTTTCACAA AGGGTTTACT TACAAAAATC 

120361 TTCCAATTTT GTATACCTGT GTTTCATAAC TGACTAGCCG TCAAACCAAG ATGTAGAGTT 

120421 TCCAACCGTT ATTTTCCAAA TTTTTAGAAA TTACGTGAAA TATTTGAATG CATGCCTTCT 

12 0481 CAATAAAATG GGACGTAGGA AGCACTGGTG CAGAAGATGG GTACAATACT TATCTGGGAC 

120541 CACTCCATTA TTTGGTTGGC ACGTTGTTTG AAGAAAAAGG GGAAAAGCTC AG GTTACTTA 

12 0601 GCATGGTTCG GACTTATTTG AAAACTACCA CAGCAGGAGC GGAAATAAGA CCGCATTACC 

120661 TCACTCTCTG CTGTGCTGTG CTAGGGGGTT ATCCAGAATA GGATTGTAGA AGTGGATGTC 

120721 GATTTAATAG TTTTTTATTC TCCCATTAGC TGAGTCTCTG ATTGGCAATG TGAGATCGTT 

12 07B1 TTAGCTTATT GATACTTTGA AATGCACTTA ACAGCCACAA ACAAGTTAAA GGGTTGTTAC 

120841 CATAAAATCT TATCCCCAGG GTGTGCTTGC ATTTATCACC CGTGTTTGCT TTCACACTAA 

120901 GTGGACTTAA CTCCCCAGCA GAATGCCTGT CAGGGAACCG GTTTCGTGGA CCCAGCATTT 

120961 AACGCCTTTC GCAGGCTTGT GAGGCCCATA AATATTTGTT GAATAAAAGA ATGAGTTGAC 

121021 CATGTCATGG TGCGCTGATT GCGTGTGCTG ACATGGAACA CAGGTTGTAA ACCTTAATAC 

121081 CAATTTGGGG CATGTTGTAT GGATGAAAAG GGCATTGGAA ATTCCTGAAG TGCATCCCAC 

121141 ATTGGACTGT GGAAATAAGT TGCAAGTGCA GAAACGTTTC CACACTTGCA GTTTGAGTAT 

1212 01 TAATTGCAGC GTTTGTGAAT TCTGGTGTTG TCTACGATTC ATTCTTGTTT GACGTGAAAG 

121261 GTATTCGCGA GACACATCGC TCTAAAACAT TGCCAGAAAA TGTAATAGAG TTGATGACAA 

121321 CTGGCCCTAA CACGGCCTAA AACTCGCACT TTTCTCTCCC TCCGCAACTA TTCAAAACAC 

121381 TGTATTTTAC ATTTCTTGCA AATTAAAAAC TAACATCTCT GGCAACGGAC CTCTAAAAAT 

121441 TTCTAATAAA ACTCCTCGGA TGCTTGTGGC ACTGCATTTG TAAACCGCCC CCTCTCAACC 

121501 TACTCCCTAA AAAAGAGCTG CTTTTTGAGA GAGAAGCGGT ACCCTCTGAT GTTACTGGGC 

121561 GGCAGTCTGC CTACAATTTC CTTCACAATG AGGCAACCAG AGCGGCTTTT TCTGTGTGTT 

121621 TGCTTGCGTT GAGGGGAGCA GGACCATAGG CCCTAGAGGC CCCCAGCTGC CTTCTGAGAC 

121681 TGGGCGAAAC CCTCGGCAGC GCGCAGGGGG CGCTAGGGCG CGAGGGGCGG GCACTGACGG 

121741 GCACCAATCA CGGCGCAGTC CCACCCTATA AATAGGCTGC GTTGGGGCCT TTTTTTCGCA 

121801 TCCTGCTTCG TCAGGTTTAT ACCACTTTAT TTGGTGTGCT GTGTTAGTCA CCATGTCTGA 

121861 AACAGTGCCT CCCGCCCCCG CCGCTTCTGC TGCTCCTGAG AAACCTTTAG CTGGCAAGAA 

121921 GGCAAAGAAA CCTGCTAAGG CTGCAGCAGC CTCCAAGAAA AAACCCGCTG GCCCTTCCGT 

121981 GTCAGAGCTG ATCGTGCAGG CTGCTTCCTC CTCTAAGGAG CGTGGTGGTG TGTCGTTGGC 

122041 AGCTCTTAAA AAGGCGCTGG CGGCCGCAGG CTACGACGTG GAGAAGAACA ACAGCCGCAT 

122101 TAAGCTGGGC ATTAAGAGCC TGGTAAGCAA GGGAACGTTG GTGCAGACAA AGGGTACCGG 

122161 AGCCTCGGGT TCCTTCAAGC TCAACAAGAA GGCGTCCTCC GTGGAAACCA AGCCCGGCGC 

122221 CTCAAAGGTG GCTACAAAAA CTAAGGCAAC GGGTGCATCT AAAAAGCTCA AAAAGGCCAC 

122281 GGGGGCTAGC AAAAAGAGCG TCAAGACTCC GAAAAAGGCT AAAAAGCCTG CGGCAACAAG 

122341 GAAATCCTCC AAGAATCCAA AAAAACCCAA AACTGTAAAG CCCAAGAAAG TAGCTAAAAG 

122401 CCCTGCTAAA GCTAAGGCTG TAAAACCCAA GGCGGCCAAG GCTAGGGTGA CGAAGCCAAA 

122461 GACTGCCAAA CCCAAGAAAG CGGCACCCAA GAAAAAGTAA ATTCAGTTAG AAGTTTCTTC 

122521 TAGTAACCCA ACGGCTCTTT TAAGAGCCAC CTACGCATTT CAGGAAAAGA GCTGTAGTAC 

122581 ACAGATGAAA TCCCCCAAGC AAATGCAACA CGCCCTCAAT TATATTAGAA TCACTTGGAG 

122641 AGTCGATAGA ACTTTAACAT AGCCTCATCT AGTAAGAATT TACTACTCAA TCTATCAAAG 

122701 ATAGCAAGGT GAATTCAAAT GCACCGAGTT AAAATCGAGT TTTAAAGTCA CCTGGGTTTC 

122761 GGTAGCCGGA AGTCCCGCGT CTCACGACTC CAAGCTAATT AGTCATAACC GTATTGAACC 

122821 AAGGTTGAAG CCCAGTCCCA GGCTTGAGGC TTTTTATTAT ACAAGGTTAA AGTGGGGATA 

122881 TTGCGTTTTG GGGTCAATAT TGCTAAAGTA GCATTTTCCG AAATTGGGTG GTCCTAAGAA 

122941 ATGCTTCTGG GATAGTTGGC AAAATATATG GCTTAACCAC GCCCTCTCCA CAGGAGTGGC 

123001 TAGCGAGCTG TCTGTCCTTG GGAAGGACGG TGACCCTGCT GGCGTGGCTG GCGCCCACGT 

123061 TGGCGTCCTC TGAAAGCCCC GCCAGGTAGG CCTAGCTCGC TTGCTTTCTG CAGCGCCATC 



Figure 9 (Page 38 of 74) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



127/162 

123121 ATGACAAAGC TTTGAAACGC AAAATGCTTT CTTTGTGCAG CGCCTTACCA TGGGTGCACT 

123181 TACGGGCTGT CGACTTGGTT TAGGCCCTTG TCAGGACAAA GGAGCTTAGT TTGTTGGAGT 

123241 TTTAGAGCTG CAACCCAAAA TCCCTTGCTC GGTTTCTCTG TTTTTAGAAA CGGAAGCGCC 

123301 CTGATTGGAT ATTTGAAAAT TACTGTGCTT AACTGGATCG TGTTTCATCA ATCGTGCAGG 

123361 ATTTTCAACC CTGGTGGAGC CCACACATTC AAAACTGAAG ATCCTTTTCT CAGAACTGCC 

123421 CCTTTAAGCT TTTGCAATTT TAATTCTGGG GGTCAGATTT TAATAATTGG ACTTTTTTGT 

1234 81 TTACATCTGA CAAGAGTATA TGATGAGCCA AGTTTACTCA CTTTTACTTA GTGCAGTTCA 

123 541 ATTCTAAAAG TTTATTTTTG CGTGTGTGCA TATGAGTTAA TAATCAGTTG TATTTTTCAA 

1236 01 ACGGTCTTTT TTCAATTGTT TTGCTTAGCT CCTTCCATCG TCTAAAGTCA GGGATACAGG 

123661 CACATCACAT CCCTGTTCCC CCTTCCTCAA ACTAATATGT AGCTACCTAG GTTTATCCTT 

123721 TAAAACAAAA ATTCTCACCT ATTTTTGTGA GAAATATACA TGTTTTTCTT TGAACTAAGT 

123 781 ATTTTACATA CACCTATCTA TATACATGCA TACTTGTGGT TTTGTTTTTT TAAAAAAAAA 

123841 AAAAAAAAAA CACGTTATCT TTTGAGACTG GGTCTCAGTC TGTTGCCCAG ACTGGACTGC 

123901 AGTGGCATAA TCACAGCACA CTGTAACCTC CAACTCCTGG GCTCAGGCTA TCCTGCAGCC 

123 961 TCAGCATCCG GAGTAGCTGG GATTGCATGC ACGCACCACC AAGCCGGGCT TTTTGTTTTT 

124 021 ATTTTTTGTG GAGACAGTCA CACCATGTTG TCCAAGCTGG TCTAGAAATG GCCTCAAGTG 
124081 ATCATCGACC TCCCAAAGTG TTGGGATTAC GGTCACTGTG CCTGGCCTTG TATGCATAAT 
124141 TGTTTTGTCT TTTGATTAGG GTTATTAATT TAAAAAACAA AGCCTGGACG CAGTGGCTCA 
124201 CATCTGTAAT CCCAGCACTT TAGGAAGCCG GATGGGCAGA TTACTTGAGC TCAGGAGTTC 
124261 AAGACCAGCC TGGGCAACAT GGTGAAATCC CATCTTGACA AAAAATACAA AAAATTAGCA 
124321 AGGCCCAGTG GCACGCACTT ATAGTCCCAG CTACTTGGGA GGCTGGGGTG GGAAGATGAC 
124381 TGGAACCTGG GAGGTAGAGG CTGCAGTGAG CAGAGATCGT GCCACTGCAC TCAAGCCTAG 
124441 GTGACAGAAT GAGACCCAGT CTCAAAACAA AAATAATAAA AATTTTTTAC AACGATGTTA 
124501 TATACACTTC TGCATGTTGC TTTTCTCTTA ACCAAACTTT TCTAAAACCC TGTCATGAAA 
124561 AAAGAAATCC TTCACATGGA ATAGCATAAG TTATTCATCC ATTTCTTATT GATAAGCATT 
124621 GATGTTTCCA GTTACCACTG CTGAACATGG TGCAATTGAA TAGAATTCCA GGGCTGAGAT 
124 681 TGCTAGGTTT TAGGTTGTAT TTTATTATTT TATTTATTTA TTTATTTATT TAGACAGAGT 
124 741 CTTACTCTGT CACCCATGGT GGAGTACAGT GCCATGACCT CAGTTGCAAC CTTTGCCTCC 
124801 TGAGTTCAAG CGATTCTCAT GCCTCCGGTC TCCCGAGTAG CTGGGATTAC AGGCACCTGC 
124B61 CACCAGGCCT GGCTAATTTT TGTATTTTTA GGAGAGATGG GGTTTCACCA TGTTGGCCAG 
124921 ACTGGTCTCA AACTCCTGGC CTCAAGTGAf CTGGCCACCT CGGCCTCCCG AAGTGCTGGG 
124 981 ATTACAGGTG TGAGCCATGG CTCCAGACCT GGACTTTGTC TTCTGTTTCA TCAGTCCTTC 
125041 TGTTGGTTCA AGCACAGTAT CACACTGAAG ACTGATGATT CTATATAAAT ATGGTAAAGA 
125101 CTGTACACCC TAACTGTTCT T A TTTTTTAA TTTTAAGGCA ATTTTAGATT CCAGCTTTCC 
125161 AAAGAATTGT GGAATGCTTA GAGCTAGAGA AGCCTTGGAA GTCATTTAGT TTTTGTTTTG 
125221 TCAGAGAAAA TTCTGTAGAG ACTCTGTCCT GCTCTCACTG AATACCATCC CATAGTACCC 
125281 CCCAACAGCT TTAAAGGGCA ATAATACCTT ATGGACAGTA TGCTTTTCCT CAAATATATT 
125341 CTAAGCCATG GTCAATGCAA AAGAGTGAGA AGGAAAGTAG AATAAGTTAT CTAAGAATCA 
125401 GTGGGTGCTC TCTTTAAACT GATTTATCAC TCCCCCTTCC AAACTCTCTT GAAGGTCACT 
125461 CTGCCTCCCT TTCTACATAA GAACTCCTAA CTCCAAGGGA GGAAGGTAAG TTATTCTTAT 
12 5521 TCCTTGCTTA GAAAAAGAGA AAATAGGTTT GGTAAGCATC CGCTTTCTGC TACCATTCTC 
125581 TGTGTTTCTG TGTTTTTTAT AGGATCATTC AATTATTGGT TGGCTCTTGA GAGGGAATGC 
125641 AAGGTTCAAG GACACAAGCC TAGATCTTGC CTGTATAGAA CCTCATGATG TTATGCTTCT 
125701 CTAAAATGAG GCCTGGAGGA GACATGTTGA AAGTGACCCA TAAATCTGCA GTATCTCATG 
12 5761 TCTCTCAATG GGGACAAGGA GTACCATGGG AAATAGCATT AGGTCAATGA CAGTAACAAC 
125821 TCCCAGGTGA GTTGATTTAT TCTTTTATTT ATAAAGTTGT TAATATGCTA CATAGTCCCT 
125881 AATTTTGCCA CAAATAGTCA TTATTTTAAT TTCATATTTC ACTATTGATA AATGAAGGAA 
125941 AAAATGAGTA GCAGTTAAGC AGTCCATAAA CCTACATATA AAGCAAATTG GAGATTTTAA 
126001 AATTGATTCT GGATGCTTAA AATCCTTCTC ATTGAAAAAA AATTTCGTAT TAGAAGATTT 
126061 CAACATTCTT TAAACTGAGA AGCATAACAT ATAAACAGAA AACCACAGCA AAACAAAAAT 
126121 GCAAAGCTCA ATAAATGAAC ACAAAGTGAA CACCATAATA ATTGCCACAC AAGTAAAAAA 
126181 ACAGAAAATC AGCCAACCCT CCCAGAGCTG CCTGATGCTT GCTTCCAGTC ACATTATCAC 
126241 TCCATCTGCC CTAAACATAA CCCCTATTTT GATTTCCAAT GCTGTAATTT AGTATGCCTG 
126301 TTTTTGAAAC ATATAAAATG GAAATAAAAC AAATGTAATC CTATGTACCT GACATATTTC 
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ACTCCAGAAC 
TTTTATTGTT 
GCATATTTGC 
GTCTTGGTAT 
CAAATGCTAA 
TAAAACCACA 
GCTGGGAATT 
TCTTTTGAAG 
TTACATCTTT 
AACAAAAATA 
GGCCAGGCCA 
TTATTGACCT 
CCAAATAATT 
TGGTCGAGGG 
ATCTCATATT 
CATAAAATGA 
GGCCTACTCT 
CTCCAAATAT 
TAGGCCAAAA 
TGTCCTTTCT 
CGCTGTCTCA 
GTAGCTGGGA 
ACGGGGTTTC 
GCCTCGGCCT 
TCTGTTTTAA 
ATTTCCTCTG 
CTCACTGCAG 
GTGGGACTAC 
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ATTAGGTTTG 
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TTCCCTTTTT 
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GTTATAAACA 
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TGCCCATTTT 

CTAGACTAGG 
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GAAGATAGTT 

TATAAAGAAT 

GCCCACCTCT 

CAAACCACAG 

CCTCATGTCT 
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CTACAGGTGC 

ACCATGTTGG 

CCCAAAATGC 
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129601 GCCATGGTTG ACTGCTCACA TGGCCGATCT TTTAGTCTAC CTCCACAGGT AGAGCTGATA 

129661 CTGTGTGGCT CAAAGTTCCT ATTATAAATC ACATTGTTGA CTGTGTGGTG GTCAAAACCT 

129721 CCAGGTAAAC AAAGACACAC TTATCAGTGA GAACATTTCA AGGGTCTAAA ATTCATCTCC 

1297B1 CAGTAGCTGA GGGCAAAGGC TAGACCTCTT TTTGGGTAAG ATAAATTTTT TACCATATAC 

12 9841 TTTATTTTGC TTTTCATGTT TAACTTTATT TTGCTTTTCA TGTTAGTTCC CCTGGAATTG 

129901 TTTTTTGTGT ATAGTGTGAA GTAGGGGGTC AAGTTTCTTT TTTTTTCCTT TTTGTTCTTT 

12 9961 TTCTGTTTAA AAGGCTATAC AATTGTCCCA TGCCATTTAT TTACAAGAGT CCTTTCACCA 
130021 TTGTTGTATG GTGCCACTTT AGATGTAAAT CAATGTCCAT ATTTGTTTGA GCCTGTTCCA 

13 0081 TTCGTTTGTC TATTTTTGGA CAACACTGCC CTGATTATTG TCATTTTATC AGTTTTGATA 
130141 TTTAATAAAG CAACAGATTT GTTTATTTTG GG CCCTTGGA TTTGTGTATT AAATTTGAAC 
130201 CCTGTTTGTC AATTTCTATA ATAAAGCTTA TTGGGAATCT GATTAGGATT ACAATGGTTT 
130261 TGTAGATCAG TTTGGGGACA ATTAATACCT TTAAAATATT GACCGCTTCA ACTGTAAATA 
130321 TACTCCTCCA TTATTTAGTT TTCCTGTTTA ATTTATCTGA GTAATACATT ATAGTTTTCT 
130381 TCGTAGAAGT CAGATACGTA GAAAATTCAA AGCCCAAGTG CAATAGCTCA TGTCTGTAAT 
130441 ACCAGCACTT TGGGAGGCCG ATGTGGGTGG ATCACCTGAG GTCAGGAGTT TGAGACCAGA 
13 0501 CTGGCCAACA TGGTGAAACC TCATCTCTAG TAAAAATACA AAAATTAGCT GGGTGTGGTG 
13 0561 GCGGGCACCT GTAATCCCAG CTAATCAGGA GACTGAGGCA GGAGAATCGC TTGAACCCAG 
130621 GAGGCAGAGG TTGCAGTGAG CCAAGTTCCT GTCACTGCAC CCCACCCTGG GCGACAGAGC 
1306 81 GAGACTTCGT CTCAAAAAAA CAAAAAAAAG AACATTCAAA TAATCAATGT AGATAATTCA 
130741 AATAACTAAA AAATGAACAG TTATTAAAAT ATCAGGATAT AAAAGCAAAA AAATCAATAA 
130801 CCTCCATATA TACAAAATGG CCAGTTAGAG AAAAAAAAAA GAATAGGCGA GACTTAAAAA 
130861 GGCTGGGAAT CTCCCTGAAA ATCTTTGAGA GCCTTGGCCC TGCCCTCAGG GATTTCTCTG 
130921 GCTTCATGCC CAGATATGGG TACAGTTCCT TGTTTAAAAA AATTTTGCTC CATCAATCAA 
130981 CAAGGGGCTC CTTCCTCAGA GCACAAGGAC CTCCATAACA CCGGACACTA GATGTCTAAG 
131041 GGACACCTCT . T AAGG AAGTT AGACTTCCAA AGAATGGTGT TTCCTCTGTC CCCAAACTCT 
131101 GGAACTCACA GCACAACTGC TCCTTGGAGT TCGGTTTCAA ATCTACAAGG CTGTCATGGA 
131161 GGTTGCAGAC CAAGTCCGTG GCCTCAGTGT CCGGATGTAC GGTGGCCTTG GCACCTGAAT 
131221 GTGAGAACAT GACCTCCCTG AAACCACCAC AAGTATTGTT TCATGTTATG TATGTTTTTT 
131281 CTTATCTGAA ATTCCTTTTC TTTAAAAATT CAAATTACAT ATTTTTCAAG CCCCTGAACA 
131341 AGCTTCATGA GCATTTATTG AACCCACAGC TTTTAAAACC TACTGAACAC TTTGCTCTAT 
131401 GTTGTCATTC ACTATCCACC AATTATTTAA TTATTGATCA ATATTGTTTC CTTAGTGTTG 
131461 GGATCATTTA TGCATGTATT TCTTTTATAT TGCATATTTT ATATTTCTGC ATTACAGTTA 
131521 TTACATATTA CTTTTGCTAC AGTAATAGTT CAGAAGTGTA CATCCAAAAT TTAGCTGTGA 
131581 AGTGGATGGA CTGAGGCAGA ACTGGAGGCA AGAAAATGTC ACAGTAATTC TAAAAAAGAT 
131641 GATGTACAAT TAGAGCAAGA GAGTAGCACT GAAATTGAAG AAAAATAGAT GCGTTTGAGA 
1317 01 GAAAATTAGG AGGTAGAATC AACAGATTAG ATGTAGGGAT GAGAAGGGTC AAAGATGACA 
131761 CTAGGGTTTT TAACTGGAGC AAGTAGGTAG ACAGAACATT TCTTCCTGAA AGGGCAGGTC 
131821 AGATCATGTG TTGTCTCAAA GGGCATGAAG AGTAGAAAGC CTGGGACAGA TCCTGAGATG 
131881 ACCAATACCC ATGGTGCAGG GAGAGGGAGG GAGATCTGCT AAAAAGACTG CAAATGTCAG 
131941 GATAGTAGAA AATCATGAGT GTGTGATGTC CTGGAAGTTG AGACAGTATC ACATTTGAGA 
13 2001 ACATTTAAAT TGGTAACTCT GACAAAACCT GGAGGCCAAC TGTGAATGCC CATGAGAGTG 
132061 AGAAGCTCCC ACACTTTTGT GGGCATCAGA AAGCCCACCA GGTTCCTGCA GTGAAGATCT 
132121 GAGAAGGATC CTCTTGTGGC TTTGGCAGGG AGAGAAGAAT TATTATGAAA TACACCCCAG 
132181 AACCTTCTTC AAAACAAAGG CCTACTCTCA AGGGGAAAAC ATTTTGCCAG AGTCTTATCC 
132241 CAGCTGGGAG AAGGTAATTC TTCCCACTGC AGCCTCATCT AGGCTTTCTG TCTCACTTAA 
132301 GGGAAGAAAA TTAGTCAACA GGGATCAGAG CTTCATGAAA ATAAATTGGA AATGGTGCAG 
132361 CCAGGAAAGG AGCAAAGGTC TGAGGAGGAG GAGAAGGAGG AAGAGGAGTT GTATCATTAT 
132421 AAATACTTGA GGAAGAGGAG GAGAAGGAGG AGGAGGAGGA GTTGTATCAT TATAAACACT 
132481 TGAGGAAGAG GAGGAGGAGA AGGAGGAGGA GGAGTTGTAT CATTATAAAC ACTTGAGGAA 
132541 GAGGAGGAGG AGAAGGAGGA GGAGGAGGAG TTGTATCATT ATAAACACTT GTGACGGTCC 
132601 CAGCCCCAAG ATATAGGCAT GCTAATAAAC TGAGGCTTAA CACTTTGACT ACAGAATGCT 
132661 GCTTCTCCCT AACACCATCA AGGCTCCAAC TGAATAACAA TGAATTATGA ATGAAAGAGC 
132721 TGTAAGGAGA GACAAAAGTT AGAATGAGAC AAGTATTGTT ATCTAGAGAT GCCAAGAAGG 
132781 CAAGGAAGAT AACTAAAAAG GCACTCTGGA TTTAGAAATA GGAAGTCATT AGTGACCTTG 
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132841 TAAATAATGG AGCCAGAGGA ATACCAAGGG CAGAAGCCTC ACTATAGTGT GTTGCACCTG 

132901 TCAGAGGTCA GGAGGTGTAA CTGACTCTCC CACAGTGTGG CTTTGGAAGA GAGAAGTCAG 

132961 CAGCTGCATG GAGATTTGGG AGAGGGAAAG CTTTTTTTTT TTTTTTTTAA TTGGAAAAGA 

133021 CTGAGCTATG TGTAAATAGA ATAAGACAGG AAGAGTGTAG ACACAGGAAA GAGGGCAGAC 

133081 AAAAACAAGT GCACAGTTAT CTAAGGGAAA CAATGGGATC AAGCTGCAAG TATATAAACT 

133141 TGTCTTGATA GAAGAATCCT TGATCTGGTT TATTCAGTGT TTGGTCCAAA CCCACATCCC 

133201 TGTTCTGCCT GTCTCTGACT TGCTCTGTGC CCCAGAAGCC CAGCTTCTAC AGATAGCATT 

133261 AGCTGGGCAG CCCTGCCCTC TTGCAACAGC TGGATTTGGC CAGTGATCAG CCCAGCAGGA 

133321 ATGTAGATGG CAAAGGAGAG AGAGGTTAGT GTACTTATTC CCTGCATCAC CCCCCTGCTT 

133381 GGTGGGCAGC TCTTCCTCCA CAGTCCCAGC TCTGGCCTAG CTCTGGTTAC AGGTTCCCTC 

133441 CCATTGCCTC TTCAGATTTA AAGGTGTGTC TGTCAGGGTA TAACTGGGAG CTAGAAATTG 

133501 CACTGAAATT GAACAAAGAA TTTTATGGGA ATGGTTGTTA ACTAGTTATA AGAGGACTGA 

133561 AAATGGAAAA GTGGAACAAA CGTATCAGAG ATAGTAATGA CAGAAAGCAA CTACCACCTC 

133621 CAGGTTTAGG AGAACAAGGA AAAGATTCTT TGAAGAGATC CCCAGAACTG GGACCTCTGA 

133681 GGAGTGTATG CTGGACCACT GATGATGATA TGTCTGTAGA TAGAGGCATG ATGAGGCTGA 

133741 TTTTAGGAGC ATGGAAGATC TCCAAACTGA AGCCAACTGC TGTTACTGGA TTCAACTGCC 

133801 ACTGCCAGGT TGAAGAACCC ATTCTGTGAG GATGTCAACA AACAAAGTGG GAAATCTTTT 

133861 CACATCCTTC CAGCCCTCTA GTCTTCCTCC AGTGCTTTCT ATTGGTAGGG TTTGGGGAGG 

133 921 TGGCTAGCAA AGCGGTATTG GAAAAGATAG AAGAGACTAA ATCTTCATAA CCAGCACAGG 
133981 GTGACACTGG ATCACTACTG TTGCTGATCT TGGGCTGCCT CATATCCCCT GTTCTTCCCA 
134041 TTAGCCCTGT CACAACTTTG TAGATATCCC TTCATTATAT GCCCTTCATA TATTCTTTTG 
134101 GTTTAACTTT TTCTGTTGGA ATCCTAATAT GGCACTCCTC CATTTTTCAG GACCAAAAGA 
134161 GTATAAAAGA TTATCTTTTA CCAAAAAAAA GACAAAAAAC TGATCTAATT CCTGATTTGA 
134221 TCATTACACA ATCTATACAT GTATCAAAAT ATCACATAGT ACCCCATAAA TATATAGAAC 
134281 TGTGTCCATT AAAAATAAAA ATTAAAGAAA AGATGGTAAA TATAGCTCTG TCAGGCAGTG 
134341 GAGGTTTTAC CACGATGGCT GTTATTTCCC CCATGAAGGG GGGAGTGAGG GAGCAGCTGA 
134401 AAGTAGGTGC TTATAGGGGT ATAGAGGGGC TCAAAGCTTT GAGAGAGGAG AATGTCTGAA 
134461 AGAGCTGCCA AATAGCATGC AGGTCCCATG GGGGCAGAGC CTCTGCTCAT TCACCAGTGC 
134521 CTCTTCAATA TCTACACTTA AGCCTAACAC AAAGTGTGTG CTTAATAAGT ATTTGCTGAG 
134581 TATGTAAAGT GGAAACAGAA CCAATCTGGC AAACTTTGTA GGACTGGTGG GCAATGAAGA 
134641 TCAGTCAGGT AAAATCTGTG GATATAAATT TATATTGATC AAAAAATTCA AGGTTAGGTG 
134701 TTTTTCTTCA GTCATGCTCA ACGATGCTTC AGCCATGCTC AACTCTTCTG TAGCCACAGA 
134761 AAAAAGTTTA CCCATAATCG AGCTGTGTCT GTGTCTGAAT AATGAAAAGA CCATGATGCA 
134821 AGGGAGTTGG AGACACAGAA ACAGTGTTTG AAGTAATGGG TAATGGAAGC ATGCTACCAG 

134 881 GGAAAGGAAA GAAGTGGCAA TAGGAAGGAA CAGAGATCTG TGGTCCTATG TCCCCTGAGC 
134 941 ATATTCACAT GTTAAAG CTA ATTCAGTTTT CAATCATCAT TAAAATTTTG TTCCTAAATA 
135001 TATGGCCATT ATTTTCCACA ACCACACTAA AACTTTATTA CCTCTGGCAA GTGACTATGC 
135061 AAGTAACTAA GAGCAAAAAT ATCCACAACT ACCATTTGAG CTATCAATTT AGGGAAAGTC 
135121 ATCTGGCTAT AATCTAAGTG ACCCTCCACT GAATGTCAGT ATCTTTGCAT ATGTGATTTA 
135181 AATCTGGGCC TTCGCAACAC CATGAACTGT TCTTGTCTTG AATATCCAGA TTGAAGGAAA 
135241 TAATCTGAGT AGTTACGAGT CCTGAAGCTA GAAAGATGGA AACCCCATTT GCTCATCAGA 
135301 AAGCCTTAGA GCTTGGGCGC TGGCGGGTCC TGTCTCACCG GGACAGAGGG GCTCTTTCCT 
135361 CCCCATCTGA TAGTCTGATA ACTAGAGAAG CCGGCCAACT TATTCTCCAA GAAGGAGCCA 
135421 TCTTAGTTCC TCCTGAAATG TTCATATTTA GAAATTATTG TTTGTCAGTA ATTTAACCCC 
135481 TTAATGGGCT TGCCTTGTGG TCCATACCAC TGAGTGCAGA GCTTGCCTGG AAGAATTGTG 
135541 AGGGCCATTC CATCTTCCAG GCAGTAGAGT TCAGTACTTC TTTAAAATTG CTGCTGAACT 
135601 CTGTATTTGA AAAGAAAGAA TCATTTGGGT GTGGTAGCTC ACACCTGTAA TCCTAGCGCT 
135661 TTGGGAGGCT GAGGTGGGAG GATCATTTGA TGCCAGGAGG ACCACTTGAG ACCACCCTGG 
135721 GTAACATAGC AAGACCCTGT CTTTAGAAAA AAAAAATACA ATAAAATAAA TACAATAAAA 
135781 ATAAAAGCAA AAAGAAAGAG TCCATCTTAG GGACAGACTG TAACTACTCA CTGGAGCTTA 
13 5841 CCTTTACATA GTTCAGGATC AATTATAATA AAACACTTTT GTGCAGATTC AATAGGATTA 
135901 TTTTAATCCC CATCATCTCT CTGAGTTTCC AGTCAGTTTC TCTGCATGTA GACACCCTTC 
135961 TCCAGCCCAC CATTGTCTCT CCTCCTATAG CTCCACCAAC AAATCAGAAC TTTTTCTAAC 
136021 TGCACCTAGT GCACCTAGAG TCTACTCCAG AATGCTCATG GAGAAAGTTT CTGAAAGGTA 
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139321 AAATGTTACT CAAAAAAATA CAGAGGACAT ATGTGGATAG ATAATGGAAG AGATAAGATA 

139381 GGTAGGTTGA AGGGTTGGGC TGCCCCTCCA CACCTGTGGG TGTTTCTCGT TAGGTGGAAT 

139441 GAGAGACTTG GAAAAGAAAG AGACACAGAG ACAAAGTATA GAGAAAGAAA AAAAGGGGTC 

139501 CAGGGGACCG GTGTTCAGCA TACGGAGGAT CCCACCGGCC TCTGAGTTCC CTTAGTATTT 

139561 ATTGATCATT ATTGGGTGTT TCTCGGAGAG GGGGATGTGG CAGGGTCAAA GGATAATAGT 

139621 GGAGAGAAGG TCAGCAGGTA AACACGTGAA CAAAGGTCTC TGCATCATAA ACAAGGTAAA 

139681 GAATTAAGTG CTGTGCTTTA GATATGCATA CACATAAACA TCTCAATGAC TTGAAGAGCA 

139741 GTATTGCTGC CAGCATGTCC CACCTCCAGC CCTAAGGCAG TTTTCCCCTA TCTCAGTAGA 

139801 TGGAATATAC AATCGGGTTT TACACTGAGA CATTCCATTG CCCAGGGACG AGCAGGAGAC 

139861 AGATGCCTTC CTCTTGTCTC AACTGCAAAG AGGCGTTCCT TCCTCTTTTA CTAATCCTCC 

139921 TCAGCACAGA CCCTTTACGG GTGTCGGGCT GGGGGACGGT CAGGTCTTTC CCTTCCCACG 

139981 AGGCCACATT TCAGACTATC ACATGGGGAG AAACCTTGGA CAATACCTGG CTTTCCTAGG 

14 0041 CAGAGGTCCC TGTGGCCTTC CTCAGTGTTT TGTGTCCCTG AGTACTTGAG ATTAGGGAGT 

140101 GGAGATGACT CTTAACGAGC ATGCTGCCTT CAAGCATTTC TTTAACAAAG CACATCTTGC 

140161 ACAGCCCTTA ATCCATTTAA CCCTGAGTTG ACACAGCATA TGTCTCAGGG AGCACAGGGT 

140221 TGGGGCTAGG GTTAGATTAA CAGCATCTCA AGGCAGAAGA ATTTTTCTTA GTACAGAACA 

140281 AAATGGAGTC TCCTATGTCT ACTTCTTTCT ACACAGACAC AGTAACAATG TGATCTCTCT 

140341 CTCTTTTCCC CACAGGAGGT GATGGCCGGA AGAACATGGC AGAGGGCAAA ACAAAACAGC 

140401 ATTGGGAACA AGCTCTGTTT AAAAGGAGAC TTGTGAACAG CAAAGAGTAG AAAGGGTTCT 

14 0461 CTTACAACTG AAGCCCATGG AAGACAAATG TGTACTGCGT GAGTTTTAAG GCAATAGGAG 

14 0521 TAGTGGGACC TAGGGCACAC CAGAGAGCAT ATTAACTCTC AAACTTTTAA AAACATTATA 

140581 TCTGCTGGAC ACAGTGGCTC ACACCTTAAT CCTACAACTT TGGGAGGCCG AGGCGGGCGG 

14 0641 GTGTAGCTTG AGCCCAGGAG TTCGAGACCA ACCTGGGCAA CATGGCAAAA TCCCGTCCCT 

14 0701 ACAAAACAAA CAAACAAAAA ACAAAATTAG CCAGGCACGG TGATGCGTAC CTGTGGTCCC 

14 0761 AGCTACTCAG AGGCTGAGGT GGGAGGATCG CTTGAGCCCC GGGAGGTTAA GGCTGCAGTG 

140821 AGCCATGATA ATGCCACTGC ATCTCAGCCT GGGCAACAGA GGGAGAACCT GTCTCAAAAC 

140881 AAAAACAAAA ACACACCATA CCCAACCACA ATGCATCTGT CTTAAGTACC AGTACCACAC 

140 941 CCCTCTACTC ACTACTAAAT AGGTGAGTTC CCAATCCCTG GTAGCAGGTT TAAGCATGTT 

141001 ATATTAAAGG TCTTAGGCTA GTGACTCATT CACTCATTAA ACAAATACTT ATTGTGCATC 

141061 TACTATAAAC TAAGTACTGT GCTAGGTACA AAAGCAAATA ATCTAAGCTC TATAAACTTT 

141121 ACTTTCTTCA TCAACAAAAT GGAGATGTTT TAGGCATCTA CTCATCATTC TGAGCTCCAT 

141181 CTTTTGTGAC TGTAGTTGGC AGAGCTTTTT ATCAGTTTCT CTAAATAGCT CTACCAGTCC 

141241 CTGGTGGATG CTGGCATGCC CAAAGGATCC ATCCTGATGG CCCTGTCTGC TTACCTTACC 

141301 TGCCTGCCTT TGCAGCACCG CTCTGCTCTT CTGCAGGACT TCCCTTATCC TTTGGGGTCT 

141361 TGCTGCTCTT AGGCTGCTCT GCTTGTTTTG ATCTGCTTTG CATCACATGT ATGTAAAGGT 

141421 CCTTTCCTTA TTTACCCATG ACCAAGGTAT TATGAGATTC TGGAATTTCC CCAAACCACA 

141481 TTGATTGCTG GGAGAATAGA AGAAGTGGAT TACAAGTGGA ACTTAGAAGG GGAGTATTCG 

141541 AGAAGACGTC TCTGCAAATC CATTTAGAGA GACCTTTCTC CAGTGGTGAC TCAAAGATGC 

141601 AGCTCCTTTC ATCCTGTGGC TTGGCCATCT TCAGCACATG GCTCCCAAGG ATGTCCTCAG 

141661 GATGGTCTCT AATCCAAGGA GCCTGAAGAG AAAAAAAGGC ATGGAGTATT GTGAGTGGTA 

141721 GGTGGTTATG GACCAGTTAT GGAAGAATAC ACATCACTTT TGCCCACCTT CTACTAACCA 

141781 GAACTCACAC AGCCATAGAC ACTGACAAGT AGGACTTAAC AAGAATCTAA TTTTGAGTCT 

141841 AGGAATACGA CTGTAGCAAA TATTTAACAG CTTCAAACAC AGGTGCATTG CTATCACTAT 

141901 GCTTGGCCCA GGCCTGTCTC CCTTTCCTGC CATGTCACAG GGGCCAGCAT TTATGTCTAG 

141961 ATTGGGTTGG TTGGGATATT AAGACAATAA TGAACCAATA CAACATCTTG AG CAT AAAAC 

142021 CAACTGATAC AATGATGTAC AAGTCAGATG ATTCTGATGA TTATGAATTA TGTCAATAAA 

142 081 AGAAATGTGA TAACTAAGGT AATTTTTGTT TTGGCAAATT TTTGTTTGTT CATGACAGGA 

142141 TGAAATCCTG TCATTTGTAG CAACATGGAT GGAATTGCAG GATACTACAT TAAGTGAAAT 

142201 AAGCCAGAAA CAGAAAGTTA AACACCACAT GTTCTCACTT ATATGCAGAA GCTAGCTAAC 

142261 TAAGTAAATA AGTTTATCTC ATTGAAGTAA AAAGTACAAC AGAGATTACT AGAGGCTGGG 

142321 AATGGTAGGG GAAAGAGATG ATAAAGAGAG ATTCATTAAA ATAAGTTACA GCTAGATAAG 

142381 AGCAATCAGT TCTAGTGTTC TATTTGTACT ACAGAATGGC AATAGTTAAC AGTAATAAAT 

142441 AATTTCAAAG AGCTAGAAAA GAGGACATTG AATGTTTCCA ACACAAAGAA ATGAGAAATG 

142501 CTTGAAATAA TGGATATTCT AATTAATTAC CCTGATCTGA TCACTATACA CAGTATGTAT 
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14 25 61 AAAAATAACA CTATGGGCTG GGCGCAGTGG CTCACACCTG TAATCCCAGC ACTTTGGGAG 

142621 GCCAAGGTAA GCAGATCACT TGAGGTCAGG AGTTAGAGAC CAGTCTGGCC AACATAGTGA 

142681 AACTCCATCC CTACTAAAAA TACAAAAATC AGCCAGGCGT GGTGGCATGT GCCTGTAATC 

142741 CCAGCTACTC AGGAGGCTGA GGCAAGAGAA TTGCTTGAAC CCAGGAGGCG GAGGTTGCAG 

142801 TGAGCCGAAA* TCGCGCCACT GCACTCCAGC CTGGGTAACA GAGCAAGGCT CTGTTTCAAA 

142861 AATAAATAAA TACATAAATA AATATTTTTT AAAAAAAGAA CATCACTATG CACCCCATAT 

142 921 ATACATATAA TTATTATGTC AATTTGAAAC ATAATTTTGA AAAATGAAAA AATGAAACAC 
142981 AAATATGAAT CAATCCTCTC CAAGTTGATA TACTTAAAAG GAAAAAAGTC CGAGGGCTTA 

143 041 AACTATTCAA TCAAAATTTT ATTAAAATGC TATAGTAATC TGGAAAGTAT TTCAGAATGA 
143101 ATTGGTATAA GGTTAGACAC AAAGATCAGT GAAACAAAAT AGAGAACCCA GAAATAGATT 
143161 CACACATCTA TGGACAACTG GTTTTGACAA AGGTGTCAAG GCTATTTAAT AAGTAAAAAA 
143 221 ATCGTCTTTT CAGTAAATGT TTCTTGAACA AGTAGACATC CGGTGTGGGG GAGAGGAGCA 
143281 GGAGCCTTAC CTCAAACTTT ATGCAAAAAT TAACTCAAAA TAGACCATAG ACTTAAATGT 
143341 AAAAGCTAAA ATTATAAAAC TTCTTTAAAA AATAGGAGAA AATCATCAAC ACCCTAGGAT 
14 3401 TAGCAAAGAT TTCTTTAAAA CAAAACAACA GGTTTATAGT TTATAAAACA TAAATAACAA 
143461 AATGATAAAT TTCATCAAAA GTGAAAATTT GCTTTTCAAA AAACATTATA AAATGAAAAG 
143521 CAGGAGGCTG AGGCATGAGA ATCACTGGAA CCCGGGAGCT ACAGGTTGCA GTGAGCCAAG 
143581 ATGGTGCCAC TGCACTCCAG CCTGGGTGAC AAAGTGAGAC TCTTCCTAAA AAATAAATAA 
143 641 ATAAATAAAT AAATAGAAAA GAAAAAGAAA AATCACAGGC TGAGAGAAAA TATTTATAAT 
14 3701 ACATGTATCT GACAAAGGAC TCGCACCTGG AAAATATAAG GAACCTTATA ACTTAGTAAG 

143 761 ATGACAAGCC AAAACAAAGA GTAAAAGTTT TCAACAGACA TTTCACAAAA GAAAACATAC 
143821 AAATGGCCAG TATGCACATG AAAAGATTTT AAACATCATT "AGTTACTAGG GAAATGCAAG 
143881 TCAAAACCAC AATGAGATAC TTCACATTCA ACAGAATAGC TAATGTTAAA AGGACTGACA 
143941 ATCCCCAGGG TGAGCAAGGG TGTGGAGGAA ACTACTCTCA TATATTGTGA ATGTAAGAGG 

144 001 CATTTTATGA TATAACTGAA TTCAGTTTTA TGTATAACTG AATTACGGAT ATGAGAATCT 
144061 CAAATGAGGA CGAATGGTTT TTACGCACAA AACATGAGAC ACAAATCTGT AAGAAATATA 
144121 AAGTCGTGAC CACGTCCTTT CAGAACTTTA ACCTGTTTGC TGAAGTACGT CAGTAACAAT 
144181 GGCAGGGAAA GGGTATCTTA AATTTCACCA CAGCCTCAAA GAGGCCATTT CGTGGATCCG 
144241 CTGAGGCTTG GAGTCGGCCT TCTGACCACG AGTCCTGCGG CTATGAAAGA GGAAGCCGCG 
144301 GTTCAGGGCG TCCTCGCGAG TCGCGCAGCC CGCCCTGCTC CAGCTGGGGA CACAGGTGGT 
144361 CACGGCGCTT TCCAGCTGCA GATCCAGGCG GCAGCCCAAG ATTTGGTCCA GCCGCCAAGG 
144421 GGTGGCTCGA GTGACTGACG GGCCTTGAAC GCTCCCAGGA CCCACATCTG GAGAGGGAGG 
144481 TGGGGGTGGG GTGCTGAAGT CATTCTTGGG GCCCCTGGGG GCGGGCATGG ACCTGGGTAA 
144541 GGCCAGAGAA ATTGACACCT CGTGACATCC CTGGAAGAGA AGTACGTTCA GTGTCACTCC 
144 601 AGAGCTGAAA GATACCGCCT TCTGGCTGGT CCCTCCTCAC CTACATACTT TTCTAATTTG 
144661 TCTGGAGCAG GCCGGGCATC TGTATTATCT GGTTATTTAA ATATCTGGTT ATTTAAAAGC 
144 721 TCTCCATTAA ATTCACATAC ACGAAAATAA AAATTAAAAA AAATTTTAAA AAAAAGAAAC 
144781 AAAAGCTCTC TAATGACCAA GTCCTACACG ATAGTGAATA AATTTTTTTG TGTGGTCCCT 
144841 AAAATTGAGT TCATGCCTTT TCTGAAGTAA TAGACGCCCA GAGAAGGGAT CGACTTACCC 
144901 ATCATGCCAC AGAGATTAAT TGGCCCCAGA ATTCTTTAGC AGACCGTGTA TATGAACGTC 
144961 CTTTGCAATC ATATAAATTA ACTGGGAAAA CCTCATTTAG TATGTTACAT GCCTAGCGTT 
145121 TTGTGCCTGA ACACCTTACA AGAACCAGGG ACTATTGCCC CAATATTATA TTTCAGGAAA 
145181 GGAAGGCCCA GACAAATGGT GTCACTGGTC CACTTTCACC CAGTTGGTAA ATGAAACCAG 
145241 AAATTATAGC TGTACCACAG AAAGGTGAAA ACGTTTCTTT TATAATTTCA CATACAATCT 
145301 TTAATGGACC CAGTGTCCAA CACATTAAAG CAAGTGCTCA GGAGTGACAT CAAGATGTAA 
14 53 61 AAAATAGTCC TGTCCTCAGG GAGTTTAGGT CTTGGAGAAA AGAGACCCAA GGAGACACAA 
145421 GACAAAGGGG AAAGAGAAGG AGCGCTGAAG ACTGAGGACC CTGCCTGTGG ACTGAAGTGA 
1454 81 GGATGGGGAC ACCCGATGCC CGGAATATGA CAGTTTGGAG GGGCCTGAAG GACTCTTCTA 
145541 TTCTCTATCA GAAAAACAGA ATTACTCTCC TAACCAGAAA AGGTATTTCA ATTTATATTT 
145601 TCCATCACAG CACTTTTCTG GTGATAATTT AATGTGTTTT AAAAAATGTA TCACAGTGAT 
145661 GGCCTGGTGT GAAATAAATA ATAAAATTTT AAGAATTAAA AAATATAAAA ATCTTTTATA 
145721 TAGACATTAG GAGTTACAAG GATAACTGTG AATTATAATT AGTAATTAAA TTGAAATACT 
145781 GATTATTTTC ATTTTTATTT AATTATTTAA TAAAACCTAT TTAACATTTA ATATTTATCA 
145841 GTAATTAAAT CTAATTGTTA ATATTTATTA TTATAAATTA TTTTAGAATT AAAAATAAGT 
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14 5901 GTAGAAGCGA GGCATGGTGG CTCAAGCCTG TAATCCCAAC ACTTTGGGAG GCTAAGGTGG 

145 961 GAGGATTGCT TGAGCCCAGT AGTTCAAGAC CAGCCTGGGC AACATGGAGA AACCCTGTCT 

146021 CAATACAAAA AAATGAGCCA TGTGTGGTGG TGCGTGCCTG TATTCCCAGC CATTCTGGAG 

146081 GCTGAGGTGG GAGGATGACT TGAGCCTAGG CAGTCAAGGC TGCAGTGAGC CCTGATCTTG 

146141 CCACTGCACT CCAGTCTGGG CAACAGAGCA AGACCCTGTG TCAATATACA TATGGACAAA 

14 6201 CTTAAAATTT AAAATGAAAG CATACTACTG ATACAGAATT GAGTAGAGAT GCAAAGCTAG 

146261 TCCTATAACC AGAACAATAA AGATAAAAAG GAGAGTGGAA GAAGGTATGT CATGAATTTC 

146321 ATGATAAATG GCAATTGCAA ATATCCTGTA GCAGAACAAA ACAACAAAAT TGTAGATAAA 

146381 ACATATCCAA CCCTTTGGAA GGCCAAGGAG GGAGGATTGT TTGAGCCCAG AAGTTGGAGA 

146441 CCAGCCTGGG CAACATAGTG AGACCCTGTA TCTAAAAAGG AAGAAAGAAA AAAAAAAAAA 

146501 AGGATGATAA AGTAGACAAT ATTGAAAGCC ATTTTCTGCA AATACATAGT GAATTTGATC 

146561 AGTAATTTTC TTCCAACAGT GCAAAAATGA ATAGATATTA GTTGCCTGAA ATAAAAATCA 

146621 AATATCCAAC AAAAAATATT GACTATCTAA TAGTATCTAA GCTAGTAAAT TTGGCCAGTT 

146681 ATAAAATGTC TTAAATTTTT ATTTAAAAAA AGAAAACCAT ATTTATAAGA AGAGGTGATA 

146741 AAGAGAAATT ATTTCAGTTA TGAAGATTTT GTTAGAAAAC TATGAGAAAA AAACTATTTT 

146801 TTGTTTTCAA AAAGTGAAAG ATTAAGTTAC CAAACAGTTG CTAAAGAATA CCAGATGGCT 

146861 GAGCGTGGTG ACTTATGCCT GTAATCCCAG TACTTTGGAA GGCCAAGGCA GGAGGATCAT 

146921 TTTAGGCCTG GAGTTCGAGA CCAGCCTGGG CACTGTAGCA AGACCCGTCT CTATTAAAAA 

146981 AAAAAAAAAA AAAAAAAAAG AATACCAGAC CTTGCTAACA ATAGCAAAGA TCAATTAATT 

147041 CAAAATTTGA AAAACTGTAA TTTATTTAGC TTTAGAGTAC TCTCGTGATA TGAGATTGCC 

147101 AAATTAATAC TTTGGGTGCA TTTCTTTTCT CAAAGGACTT GCAAATTTAC AAAGAAGTGT 

147161 TGAAGAAAAG CCACACATTG GCAGGTAATG TTTGCAAAAG ACAGATCTGA TGAAGAACAA 

147221 TATTTTTAGA ATATACAAAG AATACTTAAA ACTCAACAGT AAGAAAATAA CCTGATTTAA 

147281 AGCAGGCCAA TGACCTGAAC ATCTGTTCAC CAAAGAAGAT ACACAGATGC AAGTATGCAT 

147341 ATGAAAAGAT GCTTGACATC ATGTCATTAG GGAACTGCAA ATTAAAACAA GTAGATACCA 

1474 01 CTGCATACCT AGTAGAATGA CCAAAATTTA GAACACTGTC AGCACCAAAG GTTGCAAAGA 

147461 TATGTAGCAA TAGTAACTTG TTCATTACTG GTGAGAATGC AAAATGTGCA ATCACTTTGG 

14 7521 AAGACAGTTT GGTGGTTTCT TACAAAAGTA ACCATACTTT TACCATAAGA TTCACCAATC 

147581 ACACTCCTTA GTATTTATCC AAAGGAATTG AAAACTTATC TCCACACAAA AACCTGCACA 

147641 TAGATGTTTA TAGCAGCTTT ATTCATAATT TATCCAAAAC TTGGAAACAA GATGTCTTTC 

147701 AGTAGGTAAG TGGATAACTG TGGTACTTCT GAATAATGGA ATGTTATTTA GAGTTAAAAA 

147761 GAAATGCATT CACTTTGGGA GGCCGAAGTG GGTGGATTGC TTGAGGCCAG GAGTTTGAGA 

147821 CCAGCCTGGT CAACATGGGA AAACCCCAAT TAGCCGGGCA TAGTGGCGTG AGCCTGTAAT 

147881 CCCAGCTACT CGGGAGGCTG AGATATGAGA ATCGTTTGAA CCTGGGAGAT GGAGGTTGCA 

147941 GTGAGCCAGT GCCACTGCAC TTCAGCCTGG GCAACAGAGC AAGACTCCTC TGTCTCAAAA 

148001 AAAAAAAAAA AAAAAAAAAA AAAAAAAGAA AGAAAAGAAA AAAGAAAAAG AAAAAGAAAA 

14 8061 GAAACGATCA AGCCATGAAA ACACATGAAG GAAACTTAAA TGTATGTTAC TAAAAAGCCA 

14 8 121 ACCTGAAAAG ACTGCATACT ATATGACTCC AACTGATGCA GGGCAAGCAA GCCAAAAATT 

148181 AGGGCTTAGC CCGGGAAGAA TTCAAGGGTG AAGTGGTGGT GTTAGCAACT TTTACTGAAG 

148241 CAGCAGTGTA CAACAGCAGA ACAGGTACTG CTCCTTGCTG AGCAGGGCTA ACCCATAAGT 

148301 AATGTGCCCA GAGTAGCAGC TCAGGGGCAG TTCTGCAGTA ATATACCTGC TTTTAGTTAA 

148361 GTGCATGTTA AGGGGGATTA TGCAGAAATT TCTAGAAAAA GAGTGGTAAC TTCGGAGTAG 

148421 GTACAGAGGA AAGAAGTCGA TAATGTCCTG TTGTTGCCAT GGCAACGAAA AACTGACATG 

148481 GCGCTGGTGG GCGTGTCTTA TGGAGAGGTG CTTTAACCTC GTCCCTGTTT CGGCTAGTCT 

148541 TCAATCTGGT CCGGAGTAAA GTCCCTGCCT CCGGAGTTCA CTCCTGCTTC CTGCTTCACA 

14 8601 ACTGTATGAC ACTCTAGAAA AGACAGTAAC TATGGACACA GTCAAAAGAT TAGTTGATAG 

148661 AAATTGGGTG ACAGGAAGTG TTGAAAAGGC AGAACACAGG ATTTTTAGGG CAGTGAAACT 

148721 TCTGTGATAC TATAATGGTG AATACATGAC ATTATACATT TGTCAAAACC CATAGAAAGC 

148781 ACAACACCAA GAATAAACCC TAATGTAAAT TACAGACTTT CGTTGATAAT GACGTGTCAA 

148841 TGTAAGTTCA ATTGTAATAA ATGTACTACT GTGGTGCTGG ATGTCTATGG TGGGGGGACA 

148901 TTTTTGCTTC AATAGTTACA GTTGAAGTAA ATGTTTGTGT TTCCCACAAT GCATATGTAG 

148 961 AAACTCTCAC ATTCAATGTG ATGGTCTTTG GAGGTGGGCT CTTTGGGTGA TAGTTAGGTT 

149021 TAGTTGAGAT CCTAGCAGAT CGAGTCTTCA TGATGGGCAT GATGGGACTG GTCCCTTATA 

149081 AGAAAAGACC AGAAAGCTAG CTCTCTCTTT GCCATGTGAA GACATAGCAG GAAGGTAGCC 
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149141 ATCTGCAAGC TAGGAAAGGG CCTTCACAAA GAATCAACTC AGACCTCAGA ACAGTGAGAG 

14 9201 ATAAATTGTC GTTGTTTAAG TCACTCAGGC TGTGGTATTT TGTTTCAGCA GCCCAACCTA 

149261 AGACTGTTAA TTGGATTAGA AATTTCCTTT TGGGGATGGT GTGTGGCGGG GGGTGCGGGG 

14 9321 AGTACCTTTG TTAAGCTTTT ATATCAATGA GTTTGTAGGC TTTTCTTTTT TGGTCATTGA 

149381 CTAGGACAGT TTAAATAGTA TGAGTGTGAA GGAGATTGTT GGTCATCTAT TCGATGTCCC 

14 9441 TTCTCTGTTT TTTAATATGA GAACTCCTGA TTTTCAGCCA ACTACCCTGG AAAAAAAGCT 

14 9 SOI AATCTTTCTG ACTTCTTAAG TGTGGCCATG TACTAAATTC TGGCTAATGC AAGGCAAGCC 

149561 AAAGGTTTTA TGATAGGTTT TAGGACACTA GAGTAAAAGA GAGCTGTTGC ACACATGCTC 

149621 TTCACCCTAC TTTTGTGTCC TTTTTTCCAT CCTACAACTT GGGTTGTGAG TATGATGGCT 

149681 GGAACTTTAG TGGCTCTCTT GGATCCCAGG GGTAATTGAG GGGTGGCTGG AAGGAATCTG 

149741 TGATTTTCTG GAGTTTCCAT ACACAAACAA GACCTGGATT TTCTGGGCTT CCCAGACTTC 

14 98 01 CACATCTAGA CTTGCTTTAA ATGGGAGAGA AATAAACTTG TTTCAGCCAC TGTCATTTTG 
149861 GGCTATTTTA TAGAACTTAA TCTAATCTTC AAGGGTACAT GAATTGCTTT TCCTTAAAAA 
149921 AAAAATCAGC CATAAAATCA TCTTCTTTTT TCTTTTGTTC CCCACATTAT TTAGTTGGAG 
149981 CTCTGTAACT TTTTTTTTTT TTTTTTTTGA GACAAGGTCT TGCTCTGTCA CTTAGGCTGG 
150041 AATTCAGTGG CATGACCATG GCTCACTGCA GCCTTGCCCT CCTAGGCTCA AGCAATCCTC 
150101 GTCTCAGCCT CCTGAGTAGC TGAAACTAAG GCACATGCCA CCATGCCCAG CTAATTTCTT 
150161 TTCTTTTAGA GATGGGAGCC TTGCCCAGGC TAGTCTCAAA CTCCTAGCCT CAAGTGATCC 
150221 TCCCATCTCA GCCTCCCAAA GTGACAGGAT TACAGGTGTG AGCCACCATG CCTGGCTGCT 
150281 CTGTAAGTGT CTGAATTTCA TTTTGTATTT ATCAGTCTGT TTAGATTTTC TTTCCCTTCT 
150341 TGGGTCAGTT AGGCCATTGG TTTCTTTTTA AAGGTTTTCA AATTTATTTG CATCTAATTC 
150401 TTCAAATTAC TCTCAAAATT ATTCCAGTAT ATATTCTTTT GTTCCTATTT TCTTCTGTAT 
150461 TCTTTATTAA AATAGCTAAT GATTTATCTA GCAGGACTTA TATTCTTTCC ATAACTTTCC 
150521 TGCACCCCAA TTAATCTCCA ATTTTATATT TCTTCTGGCC TTCCTTATAG TTTCCACAGG 
150581 TTTATTTTAT TCATTTTTTA AAACTTTTAT TTAATTGTTT ATTTTATTAT CATTCTTTCT 
150641 TATTCAGCAA TCTAAGTGCT TAGGGATATA GAATTTCCTC TAAGCAGCAT ATGCTAGGCT 
150701 TTAACAATGT TAGGGAGGCC TCCCCTTTCT GGGGAAGACC ACACTTACAT TAACACAGGA 
150761 CTGTGGGATG CCAAGAGGTA GAGAAGAGCT TATGAATATC CAGATTACAT CTTCACTGAT 
150821 CCTGCACAAA GGTGGGGTTC CTCGGTTACC CACTGGGTCC TATTACCCAA GTCTGGGTCA 
1508 81 GCATACCGAG ACTACGGGTA TATAGAACAA GTGCAACTGG CGATAATCCT TCTGTTGGGG 

15 0941 AGAAAAATCT TTTTTTTCTA TTCATCTTAG GTTCTCCATC TGTGGCCCTA TCAAGTAGAC 
151001 TAACAAAAGA CAGATTGACA AGACAGAAAC AAAGCATGTG CATTGTACAA ACACAGGGGA 
151061 GTACTGAGAT GAATACTCAA AAGAGGATTT AGAACTTGGG CTTATATAGC ATTTTAAGAA 
151121 AAGAATACAT TTTTTAAGTG ACAAGGAAGA CGAAAAGGAC TTTGAGTTTC TAGTGCAGTA 
151181 AATTGTGGGA AGGCAACTTT TTCTTTCCCT TTTTTTTTTT TTTTTTTTTA AAAAAAAGAC 
151241 TTCTCTGGTG CTATGTCCAG GCTGATAAGA GTCTAAAGTC TCTGGTGACT AACTTTTGTT 
151301 CTTCCCCGAG TAAGAAGACA CCTTCACAAT TTCATATCCT GCTTTTAGGC AAACAGGGAG 
151361 AGGGCAGAGG TGTTTGTTTG TTTTTAATCT ATTTTTTTTC TCAATTGTCT TCAACTCAAA 
151421 ATACTTCTTA TGCCAAAGAT GGCATATTCT GCTACCCTTC ACTTACTACT TACAACCCAG 
151481 CCTCTATCAT CATAATTAGA ACTTCTGACC CTGGGGAACA TGGGCAATAG TTTGAACTCT 
151541 TTTATATCTC CCTTAGGCAG AGATGGAGGC CCAGCCATGC CTCTGACATC TAGACACAAC 
151601 TGTTGCTTCA TTTCTCCTAT TCTCAGAGGT GATGTTGTAG GACTTCAACA AATATCAGTA 
151661 AACATTAATT TTTTTTTTCC TTGAGGCACA GCATGATCTT GGCTTACTGC AGCTGCTGCA 
151721 GGCTCAAGCA ATTCTCCTGC CTTGGCCTCA CGAGTAGCTG GGTTACAGGC CCCTACCACC 
151781 ATGCCCGGCT AATTTTTGTA TTTTTAGTAG AGACAGGGTT TCACCATGTT GGCCAGGCTG 
151841 GTGTTGAACT CCTGACCTCA AGTGATCCAC CTGCCTCAGC CTCACATAGT TCTGGGATTA 
151901 CAGGCGTGAG CCACCATGCC TGGCCATCAA TTTTTATGTC AACTCTAAAT TATAACATTT 
151961 AGCAATTTTG TGACTTTTTA TGGTCATCAT TAATGTTGTT TATGTTTTAG TTGTAGTCCT 
152021 GTCATTACTC ACTCGGGTAT GGTAATTTGG TCTTTTTCAA AATGAAGTTA AGGTCTATTT 
152081 GCTCTTCTCT GAATCATAAT AAGAACTGCC AACAGCCATT TCAGCAATAA CTATTTACTG 
152141 AGATTTTAAA ATATTTCAAG GTAATTGGTC CTAGCAGACT GGAAAATACC AAATTCTTTT 
152201 CCAGAACTGA ATCCCCCATC AAAGTTCAAT TTTACTCATA ATTCCCTTTT CATTTGAAGC 
152261 ATCTCATTGT AAGCCAGTCT TAACCCTTCT CTCACACTTT GCTTGGCTGT TTCTCAGGTA 
152321 GAACTCAGTA AGTCTGGTAG CCTCCAGGAC TGCCGCTTAG ATTATTAAAC AACATGTCAG 
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152381 TGGTTGGAAG AGTCAATGTT ATTTTGATTT TTCTGTTTTG TTTTGTTTTA AATGCAGTTG 

152441 GCGGATAATT GCAGCTTTCT TTCATTCCCT ACATGAGTTC AAATGGCAGC AAACAAACTA 

152501 GGAGAACGCA GACCTTCTGA CTTGTGGGTA CCCCTACTCA TCACCTGAAG ACCCTTGGAA 

152561 ATCAAAGCCC TGACCCATTA AAGACGGATG GAGACAGCAA CATACGATCA TCACTATTAT 

152621 CTTGCTTTGC CCCAGTCCAG GTTAACCATC TGTGGTATTT TTAGTTGCTA AGTCCATATA 

152681 TTCAACATAA ATCAATTATA TATCCACTAA AATCTCAGCA CTAGTCTAAC TACTAAGGAA 

152741 ATGACAGCGA AGAAAACAGA CCAAACGTCT GCCCTTATGG GATTTATATT ATTTTCTCTG 

152801 TGCTGGTTAA ACCAAGGAGC TTCTGCTCTT TTCCTTAGTC ACCTGGGGGA GGCAGAAACA 

152861 AAGGAGAATA TTGATAAACC TGGAAATAGG GCCGGAGAGT ATCAGAGAAG GAAGCCTTCG 

152921 GGAAAGTAAA GATGTGGCAG CCAGTAT7CC CGTTATAAAA GGATACAACT CCGGCCTCAT 

152981 AGTCCAGAAA AATTCCCACA AGCAGGGGCT GCTCATGCAG ATGAAGGGAA GTTGGGGGAG 

153041 AAGTAAGTGC TACATAGCCT TTCTTTTTGC ACAGCCTGAG GGTCCAGAAT CCAGACTGAG 

153101 GCTCTTGCTT CATGCCAGTG CCCCTCTGCA CATTTTCCAT ACAAACTCCT AAATCCCATC 

153161 CGGTTCCTTC GCCAACATCC ACTTCAAAGT AACGTCTTCC TGAGGTGAAG CCTTCACAAC 

153221 CCAAGACACA GGGGAAGGCA GTAAATCTCC TGGAAGATGT GTCCTGATTC TCCTGGGTGT 

153281 ATCCACGAGT CACTTGTCTC CGATCCTCAG AGAGAATTAG TTCGTGATGA GCTGTATCTG 

153341 GATCCAGAGT CACACTAACT GCAAAACAAA ACAAAACAAA CAAAAATAAT TTTGTTGCTG 

1534 01 TGAAGAACAC AGGTTATTTT ATTTTATTTT ATTTTGAGAT GGAGTGTTGC TGTCACCCAG 

1534 61 GCTGGAGTGC ACTGGCACTA TCTCAACTCA CTGCAACCTC CACCTCCTGG ATTCAGGCAA 

153521 TTCTCCTGCC TCAGCCTCCG GAGTAACTGC GACTACAGGT GCGCACCACC ACAAGTGGCT 

153581 AATTTTTTTA AATTTTCTGT AGAGATGGGG TTTCGCCATG TTGGCCAGGC TGGTCTCAAA 

153641 CTCCTGACCT GAAGTGTTCC ACCCACCTCG GCCTCCCAAA GTGCTGGATT ACACAGGTGT 

153701 GAG CC AC CAT GCCCAGCCAC AAGTTATTTT CAATAAAACC AGCCTGTGTT CAAACCCAAC 

153761 TATTGTTTCT TATAAACTGG GTGAGCTTAG GCAAATCATT TAACTTTCTG AGCCTCAGTT 

153821 TGTTAACTAT AAAGTGGAAA TTACCGTATT TGTTGCAGAG AATGGTGGGT AGGATTGAAT 

153881 AAGCTTATGT TTGCTTAATG CTTGGTAAAA TTCCTGGTAC ATGGTAACCA CCTAATAAGT 

153941 GGTAGTTGTT GGGGTGATCA GGCCCAACAC CAGGCCGTGG GGGCTACAAA GTCCGGCGGG 

154001 GTCAAAGGAA TGAGAAAAGA CAAGTTAAGA GTGCATAAAG TGGGTCCAGG GTGCCAGCAC 

154061 TAGATTGGAG GCTGCAAAGG CCCTAAGCTC TGGGAGCCCA CACTATTTAT TGGTGATCAA 

154121 ACAAAGAAGC AGGTGGTGAG GACGTGAGGG TAAACAGGTG AGGGCATGAG GACATGGGGG 

154181 TAGAAAGGTA GTGGTGCATT AAGCGTAGCT GTGACAGTTT AGCATTTTCT TTGACACATG 

154241 TAGAATATAC TCTGCTGCTT GAGATAGTAG AGGACACGTT TATGAGTGAA AAGCAAGGAA 

1543 01 CCAACAAGTC TGTGCACTTT CCAGAGGCTA TGAGGGGTTT TATGCCCTGA GCCCTGGGTT 

154361 CCATCCAAGC CACAAGGGGT TTTATGCCCT AGGCTTAGAT TTGTGGTGCG GCAGGGCAGC 

154421 CTTCCACCAT TTGGCACAGA GCTTGGTGTT CCAAAGGCCA CGAGGGGTTT TGGACCCTGG 

154481 ACCCCGGACA TCTTCCAAGA CTCTTTTACA TTATGACAGA CAAGCCAGTC CTGCTTCAGC 

154 541 TCTTCTAACA ACATGTAGTA ATAATGATAT CATCAACATC ATCTTCGTCT TAATTATTCA 

154601 AGGATGCCAA GGTACAGAAC TAACCTGTTA ATATGGTTAC CATCCTGTCC AAAGTTCTTC 

154661 TCCCATGCAG GACTTCCAGG AATCATGAGA CAGTTGAGCA GAAAGATACC TTTTCCCTTC 

154721 TCTACTGAAT AACCACCAAC ATTGAGAATC AGAGAGGGAA AATGACTCAG CTAATGTCTT 

154781 AGCTTGTTAT TGGAAGACCC AGGTCTCATG ACACATGCCT AGTCCCATGA CTTTTAATTG 

154841 TAAGCTCTTC TCTTTCCCCT CAGATAATGT TCCATAAGCA TTAGTATGAG ATAATAATAC 

154 901 ACTGAGGACC AATATACATG AAAAATATCA GACTAGAATC AAACAAGACA GAAAAAAGAT 

154961 CTGATAACCT AAAGTGAGAT ACTGAACAGT ATGCAGTTTT AAAAATAAAA AATGGTAATA 

155021 GGATGTTCTA ACAAGAGAGT TAAGAAACCA CTGTGCTACT GAGTTAAATG TTGATCAGTT 

155081 GGTCTGTGAC AATTAAGGAA TTCAAGTATT CAGAAACACT TCCTGTGCTG GATGCTCTCT 

155141 GTTTGTTCTT CCAAATAATC CCTCACTTTT CCCTGTCTTG CTCTGTGCCC AGGAAGGCTG 

155201 ACATGGACAG ATTAACCAGG CTTTCCGCCC TCTGGCTTGG TTCAGCCAAT GGGAAGCACC 

155261' AGAGGAGACC ATAGGGCACA AAGAAGCAGC CTTGGGAGTA TTCAGTACCC CAGTCCCACG 

155321 CTATGATTTG GAGGGTCTGC ATTCCTCTGC CTCTGGGCAC ACTCTAGTAT AGTTACAGCT 

155381 CCCTACACCT GCCACTTGAG GCCCAGAGGA GGTGATGGCT CTCTAACTGT TCCTAGTTCT 

155441 GGGTGCTTCC TGTTCCTTGT GGATTTCCCA ACTCCTCACC TTTGTAAATA CCCTCCTTTT 

155501 TCAAACTCTA TTCAGTTAGC TTTTATCAGC CTGACTCACA GAAGTTTGGG GTTTCAATTC 

155561 ATATTACCTG AATGACCCAG GAAAACCCAT GTTGAGAAAT TAAAATGTTT ACGGGGTGGT 
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162101 AGACTCCATC TTGCTGGCAG ATTTTCTCTA AAGAGTCTGT CTCCTGAGCT CTCTCTGAAG 

162161 AAATAACTGG CCATGTTAGA AGCCCATGTG CAAAGAGCTG AGGGGTGGCC TGTAGAAGCT 

162221 GTGGGCAACC TCCAGCCAAC AGCCAGAAAT AACCAGGGCC AAAGTCCTGC AACCATCAGG 

162281 AAAGAAATTC TGCCTGCTAT CTCAGTGAGC TTGGAAGTGG ATTCTTCCTT AGCCTAGCCT 

162341 CCAGATAAGA ACACAGCCTG ACCAACACCT TAACTGCAGC CTTATCAGAC CCTAAGCAGC 

1624 01 AGGCCCAACT AAGCTGTGCC CAGATTCCTG AACCACAAAA ATTGAGATAA CATATCAGTG 

162361 TTGTATTAAG GTTCTAAATT ATGGTAATTT GTTTGTACTA ATAGATAACT AATATAACCA 

162421 CCAAATCATT TCAGGTTAGG CCAGATTTTT GTAGCCAAAT GAATCATGAT AAAACTTTCC 

162481 ATTTTCAGGG GTTTTTTTGA TTTTGTACTT ACGGATACAA ATTTGTGAAA GTATAGTCAG 

162541 CACTGATTTA AAAAATCAAG GGAGCAGGAA ACTCAGTAAA TGGTTCTAAC ATTTTGGAAT 

162601 CTGTAAATTG GTTGTAACAT TTGTCATCTG TGTTATCTAA GTCAAGTTCC TAAAATATGT 

162661 GAATGATAGG TTATCATACT CACCTACTTT TCTTGCATTG CTCTAAGAGT TGGCTGAGCT 

162721 ATTGATAATA AACACTATGA TCAGATCTAA TACCATGATG TGCTATTATG ATCATGTGTC 

162 781 AGTCACAGGG CTAAGCACTT TGTACATGTT GATGCATTTA ATTTTGATGA TAACTCAATG 
162841 AAGTAGGAGC TGTTAATATT TTCATTTTTC AGAGGGGGAA ACCAAGTCAC TTGGAGTAAC 
162901 ATGGCTAATA AGTGAAAGAA TAAGAATTTG AAAGGTTTGC ACAGATAACC AGAATGCAAT 
162961 GCTCATCACA TTCACTGAGC AGTGAATCAT ACTAACTAGA GAAAGTATGA AAGCTCTACT 
163021 GAAATTAACT AAACAACCTC TCTGGCTGTG AGCCTGCCAA GGGACAGGTG GTAAACTTGG 
163081 TTACTGCATA AGGCCCCTTC TATCCACAGT ATTCAGGAAT TCTTTAGTGA ACATACCTTG 
163141 ATGACTCCTT AACATTTTCT TCACATCGAA GTAAAGCTTG GAAACATTGC ACATAGTATG 
163201 AAGTTCCAAG GAGACAGCCT CTGATGTTTC CAGCTTCACA GCCCAACTCC TAGAATAAGC - 
163261 AGAGGCGAGA GATTTCTTCA GAGGTGCATT CCATTCATTT CTATATACGC ACACCCCTCC 
163321 CCTCCTGCAT TCAAACAGGA CTTACCTGCT CAAAGTGTCA TTCACATTCT ATAAAGAAAC 
163381 AAAAAGAAAA GGTGAGCATG GGAACATCGG TATTTCATGG GGCTTGTCAT GCAGGGCTAT 
163441 TCTTCTTTGC TTTAC CCGAA GAAGTAAAGA GAGTTACCCT AGTCTTAGTC TTAGATATTG 

163 501 ATGGATACTC AAACAAAGTA ATTCCCACCA GTCTTAGGTA TTGATGGATA CCCAGATGGA 
163561 ATAATTCCTA CCAGCTTCTG GGAGATTCAG CATGGCAGGA TGTTTATCAA CATTTGCATC 
163 621 TATTCTCATC CTTGCTGAAG TCTGAGGGCC AGGAGCTTTG TCCATGCTCC CTCTGTAAGG 
163 681 ACTAGCTTTT GGTGATCGGA TTTCCTTCAC AGTGAGCCCA GATTAGAGAA CACTTATCAT 
163741 AAAGGTCCTT AGTGGTGAAT CTGTGCACAG CCCTGAGACT GGGCCACTGC CACTAAGATG 
163 801 GTGGTAGCAG GTATCACACA GTGGTAAAGC AATCATGCTA TACACTCAGC CTTACAGTAT 
163861 AGTCACCAAT CCTGTTAGTT AGAACCAGAA TTAATGGCTC CAGATGTTTA TCTTCCTACA 
163921 GATAAAGCTG TAGATTGTAC CATAACAGCT CTGGAGCAAG GGTTCTACAA GCAAATCAGG 

163 981 GAAAAGGTTA TCACTCATTT TGGCTGCCCC ACTTCATCAC CCATCAGTCA CCTAGTGGAG 

164 041 TATTTCAGGA GAGAGTCAAC AACCAGGGTT CTCTGCACAT GGGCCAAGGA GGCAAACAGT 
164101 GGTAAATGTT ATCCCGTGGT TTCATTTGGC CAAGCTGTGT TCCCTCAGAA GTTTATTTTT 
164161 CTAATTGACA TAAAGGTACC CTATAAATTA GTGAAGGCCA GCCTGATGGC ACTGATGTAC 
164221 ATCTAAAAGA AACATTACTT TATCTTCCCA TGCTTCCTTA CCATTCTCCT TTAATAGCAC 
164281 TATAACATAC CTTTTTTCCC TACTCCAAGT ACACAGCCTC ACCTGCAGCA ATTTCTGGGC 
164341 TGAGCCCTGA CATTTTTCCT CCAGTTCCAG GATGTGGCTC TTGAGTTCAT TGCTCTTCAG 
164401 CCCCAGACCA GCCTCATAGT CCCTCAGTCT ACTCAGAGTC TGTTGTTCTT CTTTCTCCAG 
164461 CCTCCAGAGA TAAGACTTCT CTTCCTCATG TAGGAAACAC TGGAGATTCT TAAAGTCAGA 
164521 CCGGATTTTT TGTCTCTGAA TCTGTACCTT CTCCTGGAGT CAAGAAAGTA TGGTCAAAAG 
164581 GTGGAAGTAA ACCAAATGTC CATCTATGGA TGAATGGATA AACAAGAATG AAAGTCTGAC 
164641 ACACGCTACT ACATGACAAG CCTTGAAGAC ATTCAAGCAA AATAAGCCAG AAACAAAAGG 
164701 GCAAATATTG TAAGACTTTG CTTATACAAG GCATCTGGAG TAGTTAAGTT CATAGAGACA 
164761 GAAAGTAAAA TAGTGGTTAC AAGGTGTTGG CAAGACCAGA AAATGGACAG TTATTGTTTA 
164 821 ATGGGTAGTG AGTTTCAGTT TAGAAGATGA AAGATGAAAC TGAGTTGCAG TTTGGAGATG 
164 881 GGAATGGTGA TGGTTGCACA ACAATGTAAC AATGTAAAAG CACTTAATTC TACTGAACTA 
164 941 TATACTTAAA AGTGGTTAAA TGCTTAAGTG TTATATATAT TTTCACACAA ACACACACAC 
165001 ACACACAATC AGCCACTGGG ACATTATTTT CTCATGAGTC ACTGAAGCTG GAAGAATGTC 
165061 CCCAGTTTCC TGCTGCAGAG TCATGTGTGG GAGGCAGGCA CTCAGATGTG GAAGAGGTTG 
165121 CCTCAGATTC CTTATAGTCA CCCAATTAAT TTTCTTGTTC TTCAGCCAAG ACACAGGAGA 
165181 AAGCTGGGTT AGGAGTGCTA GATAATTTAA TTGTGAAACT AGGGCCAAGT TCAAACACTT 



Figure 9 (Page 51 of 74) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/14466 



PCT/US97/17658 



140/162 



165241 

165301 

165361 

165421 

165481 

165541 

165601 

165661 

165721 

165781 

165841 

165901 

165961 

166021 

166081 

166141 

166201 

166261 

166321 

166381 

166441 

166501 

166561 

166621 

166681 

166741 

166801 

166861 

166921 

166981 

167041 

167101 

167161 

167221 

167281 

167341 

167401 

167461 

167521 

167581 

167641 

167701 

167761 

167821 

167881 

167941 

168001 

168061 

168121 

168181 

168241 

168301 

168361 

168421 



TATCAGTTAC 

TTGGAGCGAT 

ACAACCAGGC 

CATTTATTAT 

CTATGTGTAA 

ATTTCCATTA 

CTATGTTCTT 

AAAATAAAAA 

AGAAGGGTCT 

ACTCTTGTAT 

AAGATCCTAG 

ACAAAGATCA 

TGTGGGATTT 

TGTGGTTTTT 

ATTTCCATAA 

GAAAGTTTGT 

AGTATAGGCC 

GGTGAATCAC 

TACTAAAATA 

AGAGGCTGAG 

CGCACCACTA 

AAATAAATAA 

CTGGAGTGAT 

GAGCCAGATG 

GTTTATTTTG 

ATACTAATTC 

ATGGAAAGAT 

AATAATGGTT 

ATTGTGCTTT 

TCAATGAAGT 

CTCCGTACAT 

AAGGAAAAAT 

AATCCCAATC 

TTTTGCTAGA 

CCTGAATAGT 

AGGGGTAAAA 

TCTCATCTGC 

TTGGATATAG 

AGCAAAACCA 

ATCAGGATGC 

TGAACAATGG 

TTATTCTTCA 

GCTAATTGTA 

CTCATTCATT 

ACATTGTCCT 

AGCTTATAAC 

TTTACAGATG 

TAGTAAGTGA 

AAAGTAATTC 

AAGAGCTGCC 

CTTCATACAA 

AACATTTTCC 

TTCCTCCATG 

CTGGATGGTA 



AAGGATAAAA 
TTTCTTGAAG 
TGGTAAGAGG 
GGTTAACTGA 
GACTGACAGA 
GTGCTGTCTA 
GGATTGCAGT 
TTAAAAAATA 
TTGACTTAGG 
AGACGGGTTT 
AAATGGGGGA 
GGGTAGAGGT 
TACTGCAAGC 
GCCTTTACTT 
AAGAATGTGG 
AGTTGCAAGT 
GGGTGCAGTG 
GAGGTCAGGA 
CAAAAAATTA 
GCAGGGGAAT 
CACTCCAGCA 
AGTATATTTC 
CCTGTTTTCT 
CTCCACTGTG 
AGGTATTTAA 
TGCTTCTTCT 
TGGTGCTAAA 
TCTTCCTTGG 
TCCTCAATCA 
CAGATTCTTA 
CTGTCTTCAA 
TTGATAAGTG 
TTCTAAGATA 
GAGCATGCTA 
TGGTAGGATT 
TAACTTGCCC 
TGACCCAGAG 
AACAAGGTAA 
ATACCAGTGT 
CAGCTGGTTA 
TATCATGAAT 
AAACAGTTTC 
GTCAATGCTG 
CTCGAGTGTT 
TACTATATGC 
CACCTCCTGT 
GAGAAACCAA 
CAAAACTCAA 
AAATGGGAAT 
ATGAGCTGAG 
GACCACCTCT 
TCAAATTTAG 
CTGCCAACAG 
CACAAAACCT 



AGAGGTTTTT 
TAAAAGCTTA 
GTCCATAATT 
GAAATACTGT 
AATCAAGTGA 
AATTAATGCA 
CTTCAAACTT 
AAAATAAATT 
ATGAGGTGGA 
TCATATATGT 
AGTAAGAGTG 
TAGAGAGGAC 
TAGTGAATTA 
TAATGGCAAA 
CTCTGATAAT 
GTGTAGGTTG 
GCTCACGCCT 
GATCAAGACC 
GCCAGGCATG 
TGCTTGAACC 
AGACTCCATC 
TTTCATCAGC 
AAGTGTTCAC 
GTAAAAGTGC 
AGTTTGAGAC 
GACTGAAGTA 
TACTCATGGA 
GTTTCATTTT 
TCCCCTATGC 
CTTTCCATTT 
GTTGCTTCAG 
AAGCCTATTC 
TATTTGAATA 
AAGGCTATAT 
TTAAACTTCA 
AAAGGGCTAT 
CCTGAGCTAT 
TCATCATCTA 
TTGGCACACA 
TTAGAAACAG 
CCAATTTAAA 
TCATATTTCT 
AAAGAATTGT 
CTCAGGAAAA 
CAAGTGCTAT 
GTATGTGTTT 
GGTGTGAAGA 
TTTCAACATA 
ATGATCATCG 
TGGTGGTCAT 
GCCTCATGGA 
GCAGGACAGA 
CAAAGTCCCA 
CTCCCTCTGC 



ACTTATGATT 
TAATGAACAT 
CTTGGCAGGG 
TCTACTACCC 
AACTCTCATC 
GTGGGAGTGT 
GGCCCAAATA 
CATACAGTGT 
ATTTTTGTGT 
TAGTTACAAT 
TACTCAGGAG 
TCCTGAAAGA 
TATAAATATA 
GACCGCAATT 
GTGGAGGTTA 
TTGCATTACT 
GTAATCCCAG 
ATCCTGGCCA 
GTAGCACATG 
CGGGAGGTGG 
TCAAAAAATA 
TTCATGAGCT 
AAAGCTTGGT 
CAGGGTAATG 
CCACTCGATG 
TCAGGAATCC 
TGTAAACCTG 
TTCAATCTGG 
CTAAGCTCTA 
AGTTATTCGC 
TTTTGTCACA 
AATTTGACTC 
ATAGTGAATA 
GTGCAGGAAC 
TTTCTGTGCT 
GACTGCCAGG 
GTCCACCACT 
AAAGATTTTG 
TGAAATTTTG 
TTCATGGAAG 
ATGATTTAGT 
ATTGAAAGTG 
CTCCTGTCCT 
GGTTCTATGT 
TCTATGCATT 
TAGGGAGGGA 
CATTAAGTAA 
AGCTGGTTCC 
CAGTTATTAG 
GATTGACATG 
GGACAGAATA 
GAAGGAAAAA 
CCTTCCTTAA 
TTCACCTTCC 



TAAGAAGTTA 
CACCCAGACT 
GGAGCTTTGA 
TAGGGTCATC 
TGAGGAGATG 
GTATTCAGGG 
AACTCTCTAC 
TTTGATGACT 
AGGAGACAGG 
CAAGGTCTTC 
CTCAAGAGCA 
GAGAAAATTG 
AAGATTGGTG 
ACTTTTGCAC 
GTCAGCCACG 
TGTGATGTAC 
CACTTTGGGA 
ACATGGTGAA 
CCTGTAATCC 
ACATTGCAGT 
GTAATAATTT 
TGAGTAGTAT 
TTCTGTACCT 
AGTTGAGGCC 
CTTTTTCTAG 
CAGCCAACTA 
GAACCAGGGG 
TTTAGTGAGA 
GAATGGAAAA 
ATTGCTGTGG 
GCTTTCTGGA 
TTCATTAGGG 
TTTATAGAGT 
ATACTGATCC 
GTAGAAAATG 
TGGTGGAGCA 
AGAGTCCTGC 
TAAAACAACA 
TGTCTTATGA 
AGGGGAATTC 
ATTCATGTCA 
ATTTGAAGCT 
CTGTAAACCC 
AACTGTTTTA 
CTATATTTTA 
GGACACTGCT 
CGTGCCCAAA 
TTTTCTTACT 
CTGCTCCATG 
TCCTTAGAAG 
AGGAGCCTGA 
GGACATCAGG 
TATGCTTTCT 
ACAACCAAGC 



GATTTCTGAG 
GGATTTTAAG 
GTGTGACAGG 
TTAAGCATTC 
TAAAGTTGCA 
CAATTTGAAT 
TTATCTTAAA 
ATGATATAGA 
TGCAGCTTTA 
CCCATTGCCC 
ACATCCACAA 
GTAATCAGCT 
CAAAAGTAAT 
AAACCTAAAT 
GAAATAATCT 
TTATAAATCA 
GGCTGAGGTG 
ACCCCGTCTC 
CAGCTACTCA 
GAGCTGAGAT 
AAAAATAAAT 
GAATTTCAAT 
GTAAAGTTGA 
TGCAAACCAG 
GTAAATAGTC 
CAGTTTAAAG 
CATAAGTACA 
ATAAATCCTC 
TAGCTTGAGA 
ACAGCTTCTG 
GCTTTTCCTG 
ACCTAGGGGG 
CCTCATTGTT 
CCTTGGCAAC 
AGACTAAGAA 
ACAATTGCAA 
CAGGAAAAAG 
TGCTGAACCA 
GTCAGGAAAA 
TGGTATCTTT 
AGCTTTTAGC 
GACCCAAATT 
AACAAGTATA 
GCAAAAGATG 
ATGTCCTCAA 
ATTATCCCCA 
ATTGCCCATC 
ACTTGGTGGA 
GAGTTTAAGG 
GACTTAGAGC 
CACTGGAGAC 
ACTATGCCCA 
GGCAAGAAAT 
ATTTCCAAAT 
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CTTTGACTCT 
ATAGTTTGAA 
ACAGTGGTTC 
GGTTTGAGGC 
ATCCAGGTAT 
TTGCTTGAGC 
CCTGGGATAC 
TACCAGAGTG 
AGGTGTTCAA 
TAAATGTCCC 
TTTTCTTTAC 
CCCTACTGCA 
AAAAAATGCG 
AACACAAATG 
AGATAATTGA 
GTATTTCCAA 
ATAAGACTTG 
ACGTCTTCAA 
CAGATGAGCT 
GACATTTCTT 
CGGAGGCTAT 
AGTTGCTTTT 
CCACAGTTGA 
TCCATCATCT 
TGGCTCTAGA 
CCTGCACCTC 
TTCCAACCCC 
AGGCTGAGGT 
CCCTCATTAG 
TTTTAAAGTT 
ATTGTTCTTT 
GAACAAAATG 
CCTGATAAAT 
CTTTTCTCAG 
CCCAGGCTGG 
GCGATTCTCC 
CCCAGCTAAT 
GGTCTCGATC 
AGGCATGAGC 
TTGCCCAGGC 
TGGGTTTAAG 
CACCATGTCC 
ACACAAACAT 
TTAACTTAAT 
GATTGCTTAA 
GTTATCACAG 
AAAGCTGTAA 
AGAATAATTG 
GCCGAGGCGG 
AACCTCATCT 
CCCAGCTGCT 
GTGAGCCCAG 
AAAAAAAAAA 
ATGCAATTGG 



TCTTCCTGAA 
TTTTACTCCT 
ATGCCTCTAA 
CAGTATAAGC 
GGTGGGGCAT 
CCCAGAAGGT 
AGAGCAAGAC 
ATATTTTCAA 
TTTTTACGTG 
TCCCCACCAC 
TAGACTGTAG 
TGGCTTTACA 
GGTGTTTGGT 
GATAGAGATA 
GCAAGAAATC 
GTCATAATGT 
GTACTTACCA 
CAAGAGCTGT 
GCCCCTCATC 
GATCCGTCTC 
CCATATGAAA 
GGCTTGGGTT 
TGCTTACTGG 
TCTTGGTGCT 
GATGGAGATG 
TATGTGATGA 
TATTATCTCA 
TGTTTGGGCC 
CAAGCAGTTA 
GTTTGCCAAG 
GTATTACAAA 
CTTTTAAACA 
TTTGCATACC 
TCTTCTAACT 
AGTGCAGTGA 
TGCCTCAGCC 
TTTTGTATTT 
TCTCGACCTT 
CACCGTGCCC 
TGTAGTGGAG 
CAATCCTCCT 
AGCTAAAGTC 
CTAGCTGTAT 
AAAAATTAAA 
TAGTTGCATG 
AAAGTTACCT 
AAGAGAGAAC 
TTACCAGGCC 
GCAGATCACC 
CTACTAAAAA 
CAGGAGGCTG 
ATTACACCAC 
AAAAGAATAA 
GTGATCTGTG 



TCGTGCTTAA 
TGATATTCCT 
TCCCAGCATT 
AAGAAAGGCA 
CCCTGTAGTC 
TGAGGCTGCA 
CCTACCTCAG 
TGTCACTGAC 
TCCTTCAGGA 
CAAAACCAGG 
ATACCTAAAA 
TATTGTGGTT 
TTGAGAACAA 
AGAGTCCAAC 
ATCATAAACT 
TAGGTTTCAA 
AAGCTCCCGG 
GGTGTGCCCT 
TTCGCAGAAC 
TTTGAGGGCT 
TGGAGCCCGA 
TTTAAAGAAG 
GTTCGTCATC 
GGTGGTTGAG 
AAGCAGCCAG 
GCTGGCTGCA 
TTTTGTATTG 
ACGTTTGAGA 
CAAGTGGTTG 
AATTTACATT 
TCTCGGGAAT 
TGGGGTCTTA 
TCACATAGCT 
TTTTTTTTTT 
CGCTATCTCG 
TCCCGAGTAG 
TTAGTAGAGA 
GTGATCCACC 
AGCCTCTTTT 
GGCAGTGGCA 
GCCTCACCCT 
TTCTCTCCAG 
AGCTAATACA 
ATGAAAAAAT 
TGACTAGTGG 
TGGACCAAGT 
TCAGGGAGTG 
AGCACGGTGG 
TGAGGTCAGG 
TACAAAAAGT 
AGGAAGGAGA 
TGCACTCCAG 
TTGGTACCAG 
ACAGATTCCA 



AATCTGCCCT 
TTTATCATAG 
TTGGGAGGCT 
GACCATGTCT 
CTAGCTACTT 
GTGAGCCGAG 
AAAAAAAAAA 
CCTTCATTCC 
GTTACTTCTA 
GACCTCCAGG 
GGTGATGGGT 
TTTCAAATGA 
CCTGTTCTAA 
CATCCCATTG 
ATTTTTCAGA 
GTTAAATCAT 
GCCCACACAC 
TTGTGCTGTG 
AGGTGGAACT 
TCAATGAGGC 
CACTGGGGAC 
TCTGTTATAC 
AGGCTCAGGC 
GCCATAGCTT 
AATTTTCCAC 
ACTGACTTCC 
AAGAAAAGAG 
ACTGCAACCC 
TTTAGAGGAA 
AAAATAGCAT 
ATGTAGGTAA 
ACTGAAGACC 
CAGACTGCTC 
TTTTTAATGA 
GCTCACTGCA 
TAGCTGGGTC 
TGGGGTTTCA 
CGCCTCAGCC 
TCTTTTCTTA 
TGACCACAGC 
GGCAGAGTGG 
AAAGAAGAAA 
GTAGCCACTA 
TCAGTTTTTC 
CTACATAACA 
GCTGGGAGAA 
TGAAACTCTT 
CTCACGCCTG 
AGTTTGAGAC 
TAGCTAGATG 
ATGACTTGAG 
CCTGGGTGAA 
AATTACTCTT 
TTGAAGGAGT 



CTCCTCCCTT 
ACATGCCACA 
GAGATGGGAG 
CTACAAAAAA 
GGGAGGCTGA 
ATTGCACCAT 
AAAAAAAAAA 
CCAAATGAAA 
AGATGAACCA 
CAGACATTTT 
CTTTCTTCCC 
TATTCATGGT 
AGCAAAAAGA 
AAGGTCAGGA 
AGAATGACAT 
CTCAGCTCCT 
TCACCTTGTA 
GTGCCCGCTC 
GCTCTCCGTG 
TTCCCAGCTG 
AGCAGAATGT 
ACAAGTGGCA 
AGATGGAGCA 
TTATTGAAAA 
CGTGATGAAA 
ATAGGTCTTG 
GACCTAAAAG 
AAGTGCAGAG 
AAAAAGCAGT 
AAGCTTTTGA 
TAGATGAGGC 
TATACTCCTG 
TAAATTATTT 
GACGGAGTCT 
CCTCCGCCTC 
TACAGGTGTG 
CCATGTTGGT 
TCCCAAAGTG 
TAAGACAAGT 
TCACTGCAGC 
CTGGGACTAC 
TGCATTGGAA 
TCATGAGTAG 
TGTTCCAGTT 
GCCTCAATAT 
GCAATGCAGG 
TCCTATTCTA 
TAATCCTAGC 
CAGCCTGACC 
TGGTGGTGCA 
CTCCGGAGGG 
AGAGCGAGAA 
TGTAATTAGT 
ATGGGGAGCT 



TCTTATACGG 
GTAGCTGGGC 
GGAGACCAGG 
TAAAAAAATT 
GGTGGGAGGA 
TGTACTCCAA 
AAAGTAGAGG 
ATCCCCCAAT 
CTCTCTACCC 
TGATGGTTTG 
TGTTTTCAGG 
GTGAAACAAG 
AATTCATCAT 
TGGACAGTCT 
GATGAAAGCT 
GGGGAGCAGG 
GCCCTGGCAT 
ACAGCGCCAG 
TTCCTCACAT 
CTTGTTGGGT 
CTCCTGCCTC 
GTAGCTGTGT 
GGTGGCTTCC 
GCTCCAATAT 
ATACACCTCA 
AAGGTTTTCC 
GAAGAAGTTG 
TTTCAAGTTG 
TTTAAAGCAG 
CTGGCTATAC 
AGCCAGTCAG 
CCTCACTTGT 
CATTATTTTT 
CACTCTGTCA 
CCGGGTTCAA 
CACCACTACG 
TGGCTAGGAT 
CCAGGATTAC 
TCTCGCTCTC 
CTCGACCTCC 
AGGTATGTGC 
TTTAGAGGAT 
GAATTTAAAT 
GCCACATTTT 
ACAACATTCT 
CTTCCTCACA 
GTTAACTTCA 
ACTTTGGGAA 
AACATGGCAA 
CACCTGTAAT 
GGAGGTTGCA 
TCTGTCTTAA ' 
AGTAACACTT 
TCACCCCAAT 
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ATATGACTCC 
TGGAAGAGAC 
GTTTTTCCTT 
AACCAGACCT 
CCAAAGAGAA 
TCATTTATTC 
TATCACCCTT 
TATACAGTAT 
ATTTATACAT 
GGTAATCACT 
AATTAGCCTG 
TTCCCTTGGC 
CCCCCACAAA 
TTTGTTGTTG 
AGAAAGAGAA 
TCCCTGGCTA 
CAACCAGAAC 
AATAAGAATT 
TATGGGATCA 
AGGACCTATC 
GGAGGTAGGG 
CAGGAGCGAA 
CCCTCCCCCC 
AGTCAGGAGG 
GGGTAGGCTG 
TGAAGACGTG 
TGGGGTCACT 
GTCCTGGGCC 
AAGAAAGAGC 
GTTGGAGGGG 
TCACATATCC 
ACTTACATAT 
TCAGACCGGA 
GAGAAAACTA 
AAACGTAGGA 
CCTGTAAACT 
TCAGTTCACA 
CTTAAGCCTT 
AACTGCTCTG 
CACAAAATTC 
CTCCTTGCTG 
GAAAATGGGA 
CCCTGGGACC 
AGTCCTTAAT 
CACCCGTTAG 
CTTGTCACAT 
TGGCTTAAGT 
GAGGAGAAAT 
GCTGACCCTG 
CTACTCAAAA 
ACTCTTCAGC 
GATTCATCAT 
CCTGAGGCAA 
CGATGAGTCC 



CTGGTATAAT 
TTTTCCCCTA 
CCAACCCCTA 
AATCAGACAC 
CCATTTACAA 
TCCCTAGTAA 
CCCCTCTGAA 
ACATACATAT 
AAGTATTTAT 
CTGTGATTCT 
CCTTTTGTGA 
TCCTACACCA 
GAACAACAAC 
TTGTTGTTGT 
AGGAGAATAG 
ATAACGTCTT 
AACCAGAAGA 
GGAAAGAAGG 
GAGCTCCTGC 
TCAAGAGACA 
AAGGCAGAAA 
AAAGCCTGCC 
CGCCCGCCCC 
AAGTTTGAAG 
TTTTCCTCTC 
TATTCCTTGG 
GCTCTTCTGG 
CCACTCATCT 
AGGAAAGAGG 
CCCTGCTGTC 
ACTGAGAAAA 
TCGCTGCTAG 
TTAAACTGAG 
GTGACGTTGT 
AGAAAATATC 
ATCATGTGAC 
GACTCTGATT 
CCTAGCTGAT 
AAGGGTGTGG 
ATCTGAGTCA 
TCCTTCTGCA 
GTCACTAGTG 
ATCACTCTGC 
GACTTAGCTC 
AATTATTATT 
TTATAAGTCT 
AGATGCAGTG 
TAAACTTGAA 
ATAGCCAATG 
TAAAGGCAAG 
ACTGCACCCT 
GCCCTGGCAT 
CCAGCACACA 
TTGCAGATAT 



GAGTATTTTG 
TCTACATAAA 
TTATCTCATT 
TTTCACAAAA 
GTTAAACTCT 
TCATTTACTG 
ATAAATATGT 
TTATACATAC 
AAATAAGGCT 
AGCCCATGTA 
GTCGATTTTT 
TCATGACAAT 
CAACACTGGT 
TGTTGTTTTT 
TGAATACCTC 
GCTAGAGACC 
ACCAGTTTAT 
CTGCAGAGCA 
AGAACTGGGG 
TGTTCAGAGT 
GAAGATGGGG 
TCTTCTGAGA 
CACACCCCTA 
AGTGCCTAGA 
ACAATTTGAT 
CAGGCTATTT 
GGAGATGGGG 
AAGTTCTGAA 
TGAGAGCTGT 
ACGAAATATA 
CCTTAGCCTG 
TCCCCTCTGT 
AAGTGAAACT 
TCATATCATT 
CTTCTTTTAC 
CCCAACACAG 
TGAGATCTTT 
GTTACTTCTT 
TGGAAAAAGG 
GCTTTCTATT 
GGACTCAGAT 
GCCCAGCAGT 
TTTGTGCTTT 
CAGCTTCTCC 
TCATGGGGAA 
CAGGTGTAAG 
GTCCAAGGGA 
TTCTGGGAGC 
GAACATGGAG 
ATTGGGAAAC 
CCTGGGTGCT 
GATGGTTGCA 
GAGAGAGGAG 
CTACAACTTT 



AATTAAAGGC 
GACCAGTCAC 
TTGTACTGAA 
TAATGTCTGT 
GTTCCTCCAT 
CCCCTCAAAG 
ATACATGTAT 
ATACATATGC 
ATATAAGTAT 
CTTGTTAATA 
CAGTGAACTT 
AAAATTTGAC 
TAATAAGGTC 
GCTTTCAGGA 
TTCTGCAGAG 
CAACCAGGAG 
CCTTTTTGTG 
GAGGGTTTGC 
AGTTTACTTT 
GATTGCAACA 
GAGGCCAGGG 
ACCTAGCTGG 
CTCCTGGGAG 
ATAAAAAACA 
CAGTCTCTTG 
CCTCCAGTGA 
CTCCCCTCCT 
TCTTCTGAGA 
AAAACAAAGA 
TTCCCCACCC 
GACCTTTTCC 
TGCTGCCACT 
ACTGTGGGAG 
TGCACTCCGC 
AGCAATAAAA 
AGTATCTAAA 
CTACTTTTGC 
TTGCTATTTA 
GGTGGTAACA 
CTTCTCTGTC 
CTTCTTCAAT 
GAGTGCCCCC 
GTGGAGAAAA 
ACTTCAAAAT 
AAAAGATGGA 
AGGCATTTAT 
ACCAGTAAGG 
CACTGGCCTG 
TTTGGCCCAG 
ACGTTCCTTT 
CACAGAGCCT 
GACCCCATGC 
AAAGAATGAG 
CATTGTTGTG 



CCTTAGAGAT 
ACTAGACAAG 
GAAAAGAGGA 
CTCTCAGGCT 
TCATTCATCC 
AATTACCTAT 
AAACGTTATA 
ATACATATTT 
CTACCCCCAT 
AATTTGTATG 
CAGAAGGCAA 
TCCACCTCGA 
GGTTGTTTTT 
GCAGAGGTAT 
AGGGGTGCCT 
GATAATGGAA 
CCCTCTCCCT 
TCCTGAGGAG 
TACTATCTCT 
TAAAGAGTTT 
ATAGGCAACA 
GCTCTCCCTG 
CTCCTCTAGG 
GTAATTTAAC 
AAGCCACACA 
TACACCAGGC 
TCCAAGGCTC 
TTTGGTGTAA 
AAGTCCTGAC 
CACTTGCCAT 
GTAACCTTCA 
TCCTGGGTCA 
GCGGGGCTCA 
CTCTCCGGTA 
AGAAGGAACC 
AACAGGAAGC 
CACCAACTCC 
TGGGTTGCTT 
GCAGTAGGAC 
CCGTTCTGTG 
AGCGAGGGTC 
AGCTTAGAGC 
GGCTGTGGGG 
GAAAGGAAAA 
TTACTATCTC 
GATAACAACA 
GGAGCTCAGG 
TCTGGGCCCC 
CTGCAATCCC 
CTTCCTATAC 
TCTGTTGTTT 
ATAGCATGGG 
CCCCTGAATC 
GATGTGACTC 



CAGCAGATGC 
AAGAACAATT 
CTAAGAATGT 
CATTCATTTT 
TCCCAAATAT 
ATTCTCCTGA 
CATACATATT 
ATATTTATGT 
TGGCAGAGGG 
CCTTTTCTCC 
AGGGGAAGTG 
CCCCCCCCAT 
TGTTTGTGTT 
AATAGGCAAA 
AAGTGGGACT 
GCAATCAAGG 
AAACTGAGGG 
CAGTTATTTC 
TCTCCAGGAC 
GCAGACCCAA 
GAGGAGTGAC 
TACCCCCGAT 
ACAGGGGCAG 
TACAATTACC 
GAATTTCTTC 
CCCTCTCTGC 
CAGGGTTCCT 
AGTCTGGTGA 
CATTTTCAGA 
CAGTACACAC 
CTGCTCAGAC 
GGAAGTTAAC 
TAAGATTTAG 
AAGGAGGGGG 
AATTAATAAC 
CTGCAGAGGT 
CTTGGGAGTC 
GTGGTTCTAT 
TCATTGGCAT 
TCTTGTTTTT 
AGCCAGGATA 
TGTGTGGGAT 
TCCAGGGTCA 
GTACTATCAC 
ACAATAAGAG 
TAATAAATGC 
ACACAGGTGG 
TGGCCTGCCT 
TCTGGTCCAA 
CAAGCAGAAG 
TGCCACCTAC 
ACATTCTACT 
CTTGGTCCCA 
TGTACCCAGG 
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174961 CATGGCTCAT TCCAGATCTG TCCTATTGTC AGAGGTGTTC AAACCAGAAT GACTCCATTT 

175021 TGAATGGGGG CTAGGTAAAA TAAGGCTGAG ACCTACTGGG CTGCATTCCC AGGAAGTTAG 

175081 GCATTGTAAG TCACAGGATG AAATAGGCAG TTGGCACAAG ACACAGGTCA TAAAGATCTT 

175141 GCTGATAAAA CAGGTTGCAG TAAAGAAGCT GACCAAAACC CACCAAAATC AAGATGGCAA 

175201 CAAGAGTGGC CTCTAGTCAT TCTCATTGCT CATTATACAC GAATTATAAT GTGTTAGCAA 

175261 GTTAGAAGGC ATTCCCACCA GCTCCATAGT GGTTTATAAA TACCATGGCG ATGTCAGGAA 

175321 GCTACCCTAT ATAGTCTAAA AAGGGGAGGA ACGCTTGGTT CTGGGAATTG CCCACATCTT 

175381 TCCCAGAAAA CATATGAATA ATCCACTCCT TGTTTAGTAC ATAATCAAGA AATAACTGTA 

175441 AGTATCTGTA TTAGTCCATT TTCACACTGC TGATCCAGAC ATACCTGAGA CTGAGTAATT 

175501 TATACCAGGA AAAAATGTTT CATGCTCTTA CAGTCCCACG TGTCTGGGGA GACCTCACAA 

175561 CCACAGCAGA AGGCAAGGAG GAGCAAGTCA GGTCTTACAT GGATGGCAGC AGGCAAAGAG 

175621 CTTGTGCAGG GAAATTCCTT CCTATAAAAC CATCAGGTCT CATGAAACTT ATTGACTATC 

175681 ATGAGAACAG CAGTATAAAT TACTCAGGGA AAGACCTGCC CCCATGATTC AATTACCTCC 

175741 CACCAGGTCC CTCCCACAAT ATGTGGGAAT TTAAGATGAG AGTTAGGTGG GGACACAGCC 

17 5801 AAACCATATC AGTATCCTTA GTCCAGAAGC TGATGCTCTG CCTGTAGAGT AGCCATTCTT 

175861 TTATTCCTTT ACTTTCTTGC TTTCACTTTA CTGTGTAGAC TTGCCCCAAA TTCTTTCTCA 

175921 CACGAGATCT AAGAACCTTC TCTTAGGGTC TGGGTTGGGA CCCCCTTTCT GGTAACACTA 

175981 TCAAAGGATC AGGAAAAGGA AGCTAGTGAA TGCTAAAAAG GAAACAAACT ACCATTACCA 

176041 ATAATAACAG CAAGACAAAA GCAAAACGGA TTGTGACAGC TGTCCCATCT CACACCTGTT 

176101 TCCCATTGCA GGAAGGAGGG GCTGGTTCAT GCACAGAGTG GCCAATATTA G AAG C AG AG A 

176161 GGGGGTGCAG ATGAGACTTC AGGAATATGT TGACAAAGGC AGGCCTAGGG AGAAATCAAC 

176221 CTGAACTATC CCCAAGGAGG AATGCATTAT CTCTAATATG TAAAGTTAGG CTTGATCCTG 

176281 TGATTATGGG ATATAGGAGT CCAAAGACTC ACAATGGGAA GTAGGTCACT AGAGTCTCCT 

176341 TCAGAAGCTC TGTACTGTGT GTTCCCACTG TGGGCAAGAG TCAGCACTCA GCTATTCCTA 

176401 GAATGCCTTT CCTCAACTCC TTCAGATTTT GCCTCTCAAC TAACCCTATC CTGACCACTT 

1764 61 GTTAGCAAGT GTACCCCTCT CTCCCTCCCA AACATTTTCA AATCTATTTT GTTCCCATGG 

176521 CACTTATCAC TGAATATTTT ACTAATTTAT TTTGTTTAGT GTTTGCTTCC CTCATGAGAA 

176581 TGCAAAGGGA TGGATTTTTT TCAATATTGT TCACTGATGA ATCCCAGTAA CTAGAATATT 

176641 TCTAAG CATA GTGATGTGCA TTAAATCAAA GAGTAACTTT CTGAATTGCA CTAAACACAC 

176701 ATCACAAGAG GTGTGTGCAC ATATGTGCAT GATGCACGTA GTGTGGTGTG GGTGTTGTGT 

176761 GGGGTATGTG GTACTGTGTG TGCTGTGTGT GGTATGTGAT ACATAGTTTG TGTTAGTGTG 

176821 ATGCATGTGA TGTGGTATGT GTGTGCGTGT CCATACATAT TAGGGGTGGC GGGGATGTTA 

1768 81 ATATGTCAAA TGGTACTAGA AAGTATCAGA ACTCATGGTG CTTACTGGTT TCCCAGAGAG 

176941 CTGCTTCTCT CCCACCTGTA GGATATACTG ATGGTTTGGA CAGAGAAGAA ATAAAAAGAA 

177001 GGCTGTGACC TACTGGGCTG AGGAAATAAA AACGAAAGTA AAAGAAGAGC TGGGAAAAGA 

177061 GAGTGGAGGG GCCAAGGGAA ATTTCCCCTT TGGCTTCTGG GGAAACTTTG CTGAAAAATC 

177121 AACTCACAAA TTTATTAACA TGTACACAGG GAGAACCATA GAATGATTAT CCACTTCCCA 

177181. AGAGGGCTTA AAAGCTTATA TATTATCCTG GCAAAACAGA TTATGGGAGG GGAAGAAGAG 

177241 AAACTCTGTT GATGGGATTA CTGTTGCGGA TTTTTGCTCC TTCGCTCAGC TAGGTCCGGG 

177301 TTTTTGTCTC ACAGCCAGGA AGAATTAGGC ATGCAGCCAT CAAAGAATGA GTGGAGTAGA 

177361 ATTTATTAAG TGAAAGGAAA GCTCTCAGCA AAG AC AAG GG TCCTGAAAGC AGATTTCTGG 

177421 TTTGCTCTTC ACAGTTGAAT ACTAGGGCTT AAGACTCAAA TTCCTGACAA CTCCACCCTG 

177481 TCCTACCAGT GCATGCAGGC CTTTAGACTG AGCTACTCCA TATTGATTAA TTTCCTGAAC 

177541 TGCGCATGTG TTAAGGAAAG GAATCATCCA CTGCAGGCAT GTTTAGGCAA GCCCCCTGTG 

177601 CAAGTTCCCT TATCTGCACA AAACATCCGG TGTAAGCACT TGTGGGGCAG GTCAGAGGTT 

177661 CTCTGGGTAC CATTCCCTTA CTGTCTGCCT AAAGCAAGCT GGCCAACTCC TTTCATTACT 

177721 AGGGAGAGTA AGTAGATCAG GGAACAGAGA TTAACTTGAA CATTATCTTG TGAAAGTCCG 

177781 TTCGGGCATG GTTACATTCT TGGTCTTACA GGAAGGGTAA ATAAAAATAA TTGCTCTTTT 

177B41 TGGTGGGTCT GGATCTTAGG TAGATAAAGA AACTTTAATT CCACGATGTG TTTTGGTAGG 

177901 GATAGTTGGT GGCAGGGATG TCAGAGAGAC TTTGAGGCTT CTTCAGTTCA ATATGAC CAA 

177961 GGGCCATATA TTAGGGTATC • AATTTCTGAG CCCCAACAAG AGCTTAGGAG AGATGTGATA 

178021 GCATCACAGT GTGAAAGCAA TTTTTTGTCT GTTTTTAGAG ACAGGCTCTT GCACTGTCAC 

178081 CCTGGCTGAA GTACAATGGT ACGATCACAG CTCACTGTAA TCTTGAACTG GGTTCAAATG 

178141 ATCCTCCCAT CTAAGCATTT CAAAGTGTTG GGATTACAGG CATGAGCCAC GGTACCCAGC 
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AAGCAAGAGG 
AAGAACAAGT 
AGCAAAATTT 
CCAAATGAGG 
CATATGTGAG 
TAAAAAATCA 
CATTAACTAA 
TCAGCAGACT 
TCCCTTAGGC 
ACTATGTGAG 
AAATAGTTAA 



TGCTGGGTTA 
CACAAGGTAC 
AAAGCCCACC 
TACACTAATT 
ACAAATACCA 
TAGTTCTGGG 
ACATAATCCA 
TATGGAGTAG 
TCAGGCTGGA 
GTTCAAGCAA 
ATGCCTGGCT 
GTCTCGAACT 
CAGGCATTAC 
GAGACAGAGT 
TTCACTGCAA 
CTGGGACTAC 
GTCTTGCCAT 
AGCCTCCCAA 
CTTAATAAAC 
TTCCAATAAC 
GGAGCTCATG 
ATGGTCACTC 
AGAGGGCTTT 
GTCAATTGAA 
GAGCCTTAGA 
AAGGAGAAAA 
CCCATGTTTG 
AGTAGGATAC 
GAAGAAGATG 
CAAAAGATGT 
GTTATAATAA 
ATAACATTAT 
GACAGAATTG 
ACTAGTTAGA 
ACCCTAACAT 
AGCTCACATG 
AAATGTAAAA 
GAAATCAATA 
CCAAGGCGGA 
CCTGTCTCTA 
CAGCTACTCG 
GAGCTGAGAT 
TTAAAAAAAA 
AAAATGAAAA 
GTATAGATCA 
TTTAGATTAA 
GTGAAATAAG 
AGTTTTATGA 
ACCGTGCAAT 
AATATTGTAG 
AGAACAATAA 
ATATTGAAAT 
TAAACAGATA 
CAATGTAATG 



GGTTAGGCAT 
CCGTCACAAA 
AAAACCAACA 
ATACTGCATT 
TGACAACATC 
AATTGTCCAC 
GAAATAACTA 
CCATTCTTTT 
GTCTGGAGTG 
TTCTCCTGCC 
AATTTTTGTA 
CCTGGCCTCA 
CCACTATGCA 
CTCACTCTGT 
CCTCTGCCTC 
AGACATGTGC 
GTTTGTCAGG 
AGTGCTGTGA 
TTGTTTTCAC 
CCTTTTGTGT 
CTGCTGCTCA 
CAGCCTGAAC 
AACAGCAAAT 
ATGATCTACT 
GACAGGGGAT 
GTGAGAGGAC 
GCAAAAAAAC 
ACTCAAAGAG 
AATCTTGAGA 
CTGGAGTAGG 
ATAAAGAAAG 
AAACATACAT 
AAGGGAGAAA 
CAAGATCAAC 
AAATCTATAG 
AAACATTTTT 
GGACTATAAT 
ACTAGGCTGG 
CAGATCACGA 
CTAACAAAAT 
GGACACTGAG 
CGCGCCACTG 
AAAAGAAACT 
ATTTCAAAGC 
CATATTTCTC 
TGAAAGACCT 
ACAATTTAAT 
TACATTTTGT 
TAAATGGTAG 
TATTTTTTTA 
AAAATATTTT 
TCCTATTTAT 
ATATTTTTTC 
TGTGATTTAT 



TCTAACCAGG 
GACCTTGCTG 
TGGCCACAAA 
AGCATGCTAC 
TGGACGTTAC 
CTCTTTCCTG 
TACGTCTGCT 
CTTTTATTTT 
CAGTGACGTG 
TCAGCCTCCC 
TTATTAGTAG 
AGCGATCCAC 
TGACCCATTC 
CACCCAGGCT 
CTGGGTTCAA 
CACTACACCC 
CTTGTCTCGA 
TTACAGGCAT 
TTTACTG TAT 
GTGAAAGAAT 
GACTGGAGCA 
GACAGCATGA 
TTGAGCAGCA 
CTGAAAAACA 
ACCATCAAGC 
AGGGAGAGAG 
ATTAACTTGC 
ATCCATACCT 
GCAGAAAGAA 
TATACTAATA 
GTATTTTGTA 
GCACCTAACA 
TAGAAAATTC 
AAAAAAATAG 
GTCACTACAC 
CAGGATAGAC 
AATAGAGTAT 
GCGTGATGGC 
GGTCAGGAGT 
ACAAAAATTA 
GCAGGAGAAT 
CATTCCAGCC 
AGAAAAATAA 
AGCCAAGAAC 
ATAGACACAA 
ACAATTCTGT 
ACAGAGAAAA 
ACTGTATATG 
ATTGTCTTGC 
TCTCCCTGCC 
TTAAAAGTCC 
ACAAAGGAAT 
TCCATAAAAT 
AGCATTTAAA 
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187921 AGTAAAACAG GCCGGGCACA AAGGTTCGTG CCTGTAATCC CAGCACTTTT GGAGGCCGAG 

187981 GCGTGCAGAT CACTTGAGGA CAGGAGTTCA AGACCAGCCT GGCTAACATG GCAAAACCCC 

188041 ATCTCTACTA AAAATACAAA AATTAACCAG GCGTGGTGGT GCACGCCTGT AATCCCAGCT 

188101 ACTCTGGAGG CTGAGGCACA AGAATCACTT GAATCCAGGA GGTGGAGGTT GCAGTGAGGC 

18 8161 AAAATTATAC CACTGTGCTC CAGCCTAGGC AACAGAGCTA GACTCTGTCA CACACACACA 

188221 CACACACAAA AGAAAAGTGT ATGACAACAA CAGTGCAAAA GAAGCGGAAA TGAAAATAAT 

188281 GTTATTTTAT ATAAGTGGTA TACTTTTAGA TGAACTACGA TAAATTAATG ATGTATACTA 

18 8 341 TAAACTCTAA GGCAACCACT GAAATAATGA AACGAAGAAT TATGGCTAAC AAGCCACAAA 

188401 AAGAAATAAA ATAGAATGAG AAAAAATATT TAAGTTGTTC AACAGATGGG AAAAAAAAGA 

188461 GGAAAAAGAG AACAAAGAAC AGATGGGACA AATGGGAAAG TAATAGCAAG ATGATAGACT 

188521 TAACTCTACC CATATAGATT ATCACACTTA AGGTAAATGA TCTAAATACT CTAATACAAA 

188581 AGCAGAGGTT GTCAGATTGA ATTAAAAAAA CAGACAACAA CAAAAAAAAG CAAAAAAAGA 

188641 GCCACAACAT GCTGCCTACA AAAAATTCAC TTTAATATAA AGACACAAAT AGTCTAGAAC 

188701 ACCATCACTT TTAACCTTAT TTACTCAAAC CTCCTAACTG ATCCCTATTT ATTTATTTAT 

188761 TTATTTATTT ATTTATTTAT TTATTTTTGA GACAGAGTCT GACTCTGTTG CCCAGGCTGG 

188821 AGTGCAGTGG CACCATCTAG GCTCACTGCA GCCTCTACCT CTCGGGTTCA AGCGATTCTC 

188881 CTGCCTCAGG CCTCCCAAGT AGCTGGGACT ATAGCACATG CCACCATGCC CAGCTAATTA 

188941 TTATATTTTT AGTAGAGACG GGGTTTTGCC ATGTAGGCCA GGTTGGTCTC AAACGCCTGA 

189001 CCTCAGCCTC CCAAAGTGCT GGGATTACAG GCGTGAGCCA CAGCACCCAG CTCCTCTTCA 

189061 TTTATTCTTG CTACGCTTCC TCCAATCCAT TTTGTGCATT TGATGATTTT GCCAGTAACT 

189121 TCTTTATTTT TCTGGTAAAA TTACTTATGG GTCACTGAGG ACTGGGATGT TCTTTCTTCT 

18 9181 AGAGGGGGTT TGTGTCTGCT TTTGCCAGGA AGCTGGGGTA CCACCAGTCA AGTATTACTT 

18 9241 TAAACTCAAT TCATGAATTG AGACTTTTTT TTTTTTTTTT TTTTTTACGC AGAGTCCTAC 

189301 TCTGTCACCC AGGCTGGAGT GCAGCGGTGT GAACATGGCT CACTGCAGCC TCAACCTACT 

189361 GAGCTCAAGC AATCCTTCTG CCTCACCATT CTGTATAGCT AGGACTACAG GTGTGTGCCA 

189421 CCATGCCTGA CTAATTTTTT AAATGTTTTT TTTAGAGATG GGGCTCACTT TGTTGCCCAG 

189481 GCCGGTCTCG AGCTCCTGGG CTCAAGTGAT CCTCCCACCT TGGTCTCCCA AAGTGCTGGG 

IB 9541 GTTACAGGCA TGAGCCTCTG TGGCTAGCCA AGACTTTTTA TTTTTTAGCC TAAATGTGTA 

189601 TAAAAGTTGG CTTGTGGTTA CAACTTATCA GGATTGATGA TCTCTCTCTC TCTCTCTCTC 

189661 TCTGTCTCTC CCCACCTCTC TCACATCCCT TGCTCTGCTG AGAAGCAGAG CAAACATTCT 

18 9721 AGCAGTTTCC AGAGAGTAGG ATGGGATTAC TTCTAGTTTA CTTTTATCAT CCTTTGGGAT 

IB 97 81 CGCAGTATTA CTGGGAGAAC ACAAGTATCT CTTATTAGAC ATACCACCTT TGTAGAATCT 

189841 GGACTTTCAT TTTAGACTTT ATTTGTTTTC TACTATAAGC AATTTAAGTT ACAGATCTCT 

18 9901 CTACACACTG TTTAAGTTGC ATCCCATGAA TTTTGATGTG CTTTATTGTC ATTATTATAT 

18 9961 AGTACAATGT ATTTTGTAAT TTTTTGTGAT TTGTTTGGAG AGATTGATTA ATTAGAATGA 

190021 TGTTTAATTT CCAAATATGT GTGTTTTTTT CCTACATTTC TTATTTTTAT TGATTTCAAA 

190081 TTTATTTCTA CTGTAGTCAG ATTTAATAAT TCATTTATTT TTATTATTTT CATTTTTTTA 

190141 GAGACAGGGC CTTTCTGTGT TGCCCAGGTT TGTCCCAAAC TCCTAGTCCC AAGCAGTTCT 

190201 CCTGCCTCAG CCACCCAAAG TGCTGGGATT ATAGGCACGA GCCACCCGTG CACAACCAAC 

190261 AATTCATTTA AAAAGTGGGC AAGTGAACTG AACAGACATT TCTCAAAAGA AGG CATACAA 

190321 TTGGC CAACA AATATATGAA AGAATGCTCA ACATCACTGT ATTAGTCTGT TTTCATGCTG 

190381 CTAATAAAGA CTTAACCTGA GACTGGGGAA TTTACAAGAG AAAGAGGTTT AATGGACTTA 

190441 CAGTTCCACA TGGCTGGAGA GATCTCACAA TCATGGTGGA AGGCAAGGAG GAGCAAGTCA 

190501 CATCTTACAT GGATGGCAGC AGGCAAAGAG AGAGCTTGTG CAGGGAAACT CCCGTTTTTA 

190561 AAACCATCAG ATCTCGTGAG ACTCATTCAC TATCATAAGA AC AG CAT AGG AAAGACCCGG 

190621 CCCATAATTC AGTCACCTCC CACTGGGTTC CTCCCAGGAC ACATGGGAAT TGTGGGAGTT 

1906B1 ACAATTCAAG ATGAGATTTG GGTAGGGACA CAGCCAAACC ATATAAATAA CTAATCATCA 

190741 GGGAAATGCA AATCAAAACC ACAATAAGGT ATCATCTCAC CCCAGTTAGA ATGGCTATTG 

190801 TCAAAAAAAC AAAAAATAAC AAATGCTGGT GAGGATGTAC AGAAGAGGGG ACTCTTATAT 

190861 CCTACTGGTG GAAATGTCAA TTAGCATAGC CATTATGCAA AATAGTATGG AAGTGAGGTA 

190921 GGTTACATAG GGTGGTCACA GCCTCCCTTG AAAGGAAACA AGAAACTTGT CAAATTGATG 

190981 GAGAGAACAA ATCTCTTGAC ATTACACAAA CTGCATCTGG GGCTAGTGGT TAGAATATCC 

191041 TCAGTCAAGG AGGTAGAAGA GCAGGAGGGA AAATCCCTAA GTTCGTGCAA GTGCAGAAAC 

191101 CCACAAGCTG TGTTCTCAGG TTGACATATA CTCATTTTAA TAGTAAGAAA CACACCCTTG 
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193441 
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193561 

193621 

193681 

193741 
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193861 
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193981 

194041 

194101 

194161 

194221 
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GGTAGAGAAT 
TGCATGCACA 
CTCATGGATA 
AAGTAGCCGA 
CCTACTTTTA 
CCAGCCCACT 
GAAAATCTGC 
ACATCCCCAT 
TGTTCATTAA 
GGCCATGAGA 
ATGTTAAGTA 
GAAAGTAGAG 
CTGGGAAGGG 
AGCTAGATTG 
TCTCATTTAT 
AAATGATAAA 
GTGTACACAT 
ACTAAAAATA 
AGCAAAATGT 
TAGATAGGAT 
TTACCAAAAA 
TTCTTTTCCA 
TTAAAACTTC 
TGTTTTCTTG 
GTCTCACTCT 
ACCTCCTGGG 
GTGCCAGCAT 
CAGGCTGGTC 
TTTATTTTTT 
TTGGCTCACT 
GTAGCTGGGA 
ATGGGGTTTT 
CTCTGTCTCT 
TTTTAGTCTT 
AGTTATCTGT 
ATTTTTAAGT 
CTAACATTAA 
TGAACATAAA 
GGGGTGGAGA 
CTTCTGCATT 
ATTTTCCTGG 
TTTACATAAC 
TTCTCTGTCT 
CCATATCGTT 
ACATTTGATC 
CCTAAACACT 
ATTAGGCTGG 
TATTGAGGGA 
AAACAAAACA 
GAGAGAAGAA 
GGTGAACCAT 
CAGAAAGATT 
TGAAGTTCCA 
TAAATCTGGG 



TAAAATGCTA 
TTCAAGAGAC 
ATCACGTAGG 
CCCTGACTCT 
CTTTGGACTG 
GACAACAGAG 
TACTGGCTAT 
GTTTATTGCG 
CAGACAAATG 
AGAATGCAAT 
AGATAAGCTA 
AAAAATTTTT 
TAGCAAGGAG 
TAGAAATGAG 
TGTATATTTT 
TGTTTAAGGT 
ATAAAAATAT 
AAAGAAAAAA 
GCATGCAGAT 
TGTTCAGATC 
AAGGGTGTTA 
TTTTTACTTT 
GTTATGTATT 
ATGAAATGAC 
GTTGCCCAGG 
TTCAAGCGAG 
GCCAAACTAA 
TCGAACCTCT 
TGAGACAGAG 
GCAACCTCCG 
TTACAGGCAC 
TCTATGTTGG 
GGTAACACTC 
TTTATGCTTT 
GTTTTTATAT 
CGATTCTAAC 
CATTTATTTT 
GTGTGATAAC 
AAGCATTCAA 
AAAAAATATC 
AAAATGCCAT 
CTACATAAGA 
GCTAACAGGT 
TTGCCTTTAA 
TGGTTCTTGT 
CCATTATTTT 
ACATTAGGCT 
AAAAAAATTA 
AAACATTACA 
CAAAGCAGCA 
GTTTTGGAGA 
GCATGCATAA 
GGACCATGAG 
AATGAAACAG 



ATAATACATG 
CACCCAAAAC 
ACTCCCATAA 
GCTATCAGCG 
GCTTTCAAAT 
GTTTCTCAGA 
TTATCCAAAG 
TCACTCTTCA 
GATAGAAAAT 
CTTGTCATTT 
GGATTGGAAA 
AGCTCATGGA 
GGGAGGATAG 
TTCCGGTGTT 
CAAAAAGCTA 
GATGGATATA 
CACTCTTTAT 
AGAATATGAT 
ATTGTGTATA 
TTCTGTGTCT 
AACTCTCCAA 
ATGTATTTTG 
TTGAAACTCT 
CCTTTTCTAT 
CTGGAGTACA 
TCTCCTGACT 
TTTTGTATTT 
GACCTCAGGT 
TCTCACTCTG 
CCTCCTGGGT 
ATGCCACCAT 
CCAGGCTGGC 
TCTGTCTTAA 
CTGTTTGCAT 
TTAAGATGTT 
AATCTTTGCC 
TCTTTCCACA 
TGACATCCTT 
CAATTTGCCA 
ATTACATTTT 
AACCACGTCT 
GACACATTAT 
TTACCAAGAG 
GACAGCGTAA 
GGATGATTTT 
AAAATGTATT 
TCTCTATGGC 
TTTTCTATAT 
TTCTAAATGC 
AGCAACTCCT 
AGGAAAAGGT 
AGATCAAGGG 
CTTGGAGAGC 
TGAAGCCTCT 



TGATGTATGT 
ATATTTAACA 
CGGGAGTTTC 
TGTACTTTCA 
TCTTTTGTGC 
AACCTAAAAA 
GGAAGGAAAT 
CAAGAGCTGA 
GTGGCATATA 
GTGGCAACGT 
GATAAATACT 
TTTAGAGAAC 
GGAGAGGTTG 
CTGCACCATT 
GAAAAGAATT 
CTAATTACTC 
CCCGTATATA 
CTATCATGAT 
ATGTTCTATA 
TTACTGATAT 
ATGTGATTGT 
AAACTCTGTT 
GTTGTTAGAA 
TGTCGTTGTT 
GTGGCACAAT 
CAGCCTCCAA 
TTATTAGAGA 
GATCCGCCCA 
TCACCCAGGG 
TCAAGCAATT 
GACTGGCTAA 
AACTGACTCC 
ACTCTATTTT 
AGTGTATATA 
TCTCTTCTAG 
TTTCAATTGA 
GTACACTGGC 
ATTTCATTCC 
TAATTATAAT 
GCATGAATTA 
CTCAATTTTG 
CAAGTATATT 
ATGGCACTCT 
CTACTTCTTT 
AAATGACTCA 
CCTTTATGCC 
AGACATTAGG 
AAGTTTCCAG 
TGTAACAAGA 
GGAAGGACCA 
CACCAAGAGA 
TAATAAAAAA 
ATGAAGTACA 
GGCAGAACTC 



ACTAGCGTGT 
ACAATGCCCA 
TTCAGTGTCA 
CCTTGCAATA 
AGGGAATTCA 
TAGATCTACC 
CAGTATACAA 
TATATAGAGT 
TACACAATGA 
AGATGAAACT 
ACATGTTATC 
AGAACTGTGG 
GTTAATGGTG 
GTAGGGTGCA 
TTGAATACTC 
TGATTTGATT 
TGTACAGTTA 
GTATATATCA 
AATCAATTAG 
TTTGTCTAGT 
AGAATTGTCT 
ATGACATTTT 
TCATACATTT 
TTTGTTTTTT 
CTTGGTTCAC 
GTAGCTGGGA 
CAGAGTTTCA 
CCTCGGCATT 
TAGAATGCGG 
CCCATGCCTC 
TTTTTGTATT 
TTTAACAATA 
AGCTGTTATT 
TTTTAATATG 
CCAACGTGTT 
AATATTTACA 
TAGCATCTCC 
TACTCTGAGT 
TCTTTTTGTT 
TTAGGAGAAA 
TTTCCATCTT 
TTACATGGCT 
TGTATTTCTG 
CACCAGTATT 
AGCTAATAAT 
CACAATAAAC 
CTGGACCCTA 
AAAGCCAAGA 
TAAGAAAAAG 
CTGCTGCAGA 
AGGAGGGGGT 
AATTCCGTAT 
GGAGGAGGGT 
ACATCTCTTT 



ATGGCAATAT 
TTCCCACCCC 
ATTGGTGCTG 
AACTCCTTTG 
AGAATCTGAA 
AGATGAGGCT 
AGAGACACCT 
CAACCCTAAA 
AATACTATTT 
.GGAGAACATT 
ACTCATATGT 
GTACCGGAAG 
ACAAAATTAC 
TATGGTTAAC 
ACAACAAAAT 
ATTACACATT 
TTATATGTCA 
TGTGTACTTG 
CTCAAGATAA 
TATTGCATCA 
ATTTTGTCTT 
GCTATGTATT 
ATGATTATTA 
CTGAAATGGA 
TGCAACCTCC 
TTACAGGCAT 
CCACGTTGGC 
TTTATTTTAT 
TGGTGTGATC 
AGCCTCCCGA 
TTTAGTAGAG 
CAAAATATCA 
ATTATAGCCA 
TTTATTCTCA 
TGGTTCTTGC 
CCATTAACAT 
CATATAATAT 
GGAAAGGGCA 
ACACTGTTTT 
ATATTTTCCA 
TCTTCCACAT 
TCTCAGTGTC 
GTGGCTATGT 
AAAGACATGT 
CCTAATTTTA 
ATTTATTGAC 
GCCATATATC 
TGTGTTTTAA 
TGTTGAGGCT 
GGTAATAACT 
CCAGGGTGTT 
TATGTAAATG 
TGGTTTCAAA 
CCTCCCCTCT 
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TCCTTGCACA 
GAGCCTAGGA 
CTAGGTTCTT 
AGCCTAAGCT 
ATGGCCTAAA 
AACCTTTCTT 
GGGCCATAGA 
GACGACCACT 
TTTGAGCACT 
ATCTACAAAA 
AAGGCCAGGA 
GTGATCCTGG 
TTTGAATTTG 
TTCAAATGTT 
GATGGATAGG 
AGTGTTCTTT 
AGGTTTTTGG 
CCAGGCTGTA 
GTAATTCCCC 
CAGCTAATTT 
AGTCTCAAAC 
ATAGGCATGA 
TTAATTTTGA 
GGCTCACTGC 
AGCTGAGATT 
GGGGTTTCAG 
CTCGGCCTCC 
CAGTCCCACT 
CTATGAGATA 
TGTGTGCTCT 
CACACACACA 
ATCCCAAGGG 
ATTATTCTTT 
TAGAGTTACA 
CTTTTATCCA 
TAAAACACAA 
AAAGTTATAC 
TGGTAATACT 
TCCTATCTGC 
GTCTCCAGAA 
CTCTCATTCA 
TATCTATATG 
CCTTCTAAAA 
GGATTAGGGA 
TCATGAGGTG 
GAGTGAGAGT 
TGTCCATCCA 
TCATTTGTCT 
CTGAGCCTCA 
AGGTTCCTTC 
GTAAACCTTG 
AATCTTGTAT 
TGGCTGAGTT 
TACAACCTCA 



TTCCCTTTAT 
AGTGCTAGGG 
CTAGGAGAGC 
CACCTCCCAA 
ACCCTTCCAT 
CCTCTAAGTC 
CGTGCTACCA 
GGCCTTTGAA 
GCTGCCAAGA 
CACCTAACCT 
AACTGTTCCA 
ATTGAGGGGA 
AATTTAAAGA 
TCAGTAAGTA 
TAGAGAAAAA 
GTACTGTTTT 
GGTTTTTTTG 
GCTCAGTGGC 
CTACCTCAGG 
TTATTTTTTA 
TCCTGCCCCC 
GCCACTGCAC 
GTCAGAGTTT 
AAACTCTGCC 
ACAGGTGCCT 
CATGTTGGTC 
GAAAGTGTTG 
CTACCTTGTC 
GAGGAATCCA 
CATGTGCTCT 
CACACACATG 
TTTTGTGTTG 
TTCTCTTTTT 
GTGCCTCTAT 
GTTCAAAATA 
AATTTTATTT 
ACACACAAAC 
TCATCAGGCA 
TTCCTTCAGC 
CTTCCATTCA 
TCTCTATTCC 
AAGTCTGCGA 
TTCTTATTAC 
TCATGATCTC 
GAGGTTGCAG 
CTGTCTCAAC 
AAATTGAGAA 
GGTGTGGTGG 
CTCCTGAGAA 
CTGGATAAAA 
GATACTGGGA 
TTCATTAAGA 
CTTTTAGAAC 
GCTAAAGGAT 



GGAGTAATTG 
TAAAGTGGAG 
CCTTCCCCAT 
AGACCCCTTA 
AACTCTATAG 
TGCCACCCTA 
AGTCTCCAGA 
CCAGACCCTT 
CATCTTTGGC 
TTAAAAATTC 
GGTTAATAGA 
AAAAGTGTTG 
TAAAGTATTG 
TATATATATA 
GCAAATGTAT 
TCTGATTTTT 
TTTGTTTTTT 
CCAATCATTG 
CTCATGAGTA 
AATTTTTGTA 
AAGTGATCCT 
CCAGCCCCAA 
CACCCTTGTC 
TCCTGTGTTT 
GCCACCATGC 
AAGCTTGTCT 
ATGAGCCACC 
CTACACTACC 
AGGAAGAAGA 
CTCTCTCTCT 
AATACCAGAG 
TAGTGGTTTG 
GCAGCTGAAG 
TCAGGCTTCA 
ATGCATTCTC 
ATGCTGAACA 
ACATTTGCTC 
TGAGTAGTAC 
ATTCTCCAGT 
CATTTAGAAG 
TTCTTCTAGC 
ATGGTTCTCA 
CCAGGGCATA 
TGGAGTCTGG 
TGAGCCGAGA 
AACACAAAAC 
CCATTAGGTA 
CAGCTTTTTG 
CACTGGTGTG 
ACCACTGACC 
AGCCTACAGT 
CTAATATTTG 
TTTTGCATTG 
TAAAAGACAC 



CAGGGATGGG 
AATGAACCTG 
AAAATCTGCC 
CTTGCTGACT 
CCAAATTCAA 
GGCAATTCTC 
CCTAGACCTG 
CTCTGTGGCT 
ACTTTGTTGT 
ATTGTCATTT 
GACTAAAGAG 
TCAGAGACAT 
AGTAATATAG 
TAAAGAGATA 
AATATTAACA 
CTATATGTTT 
GTTTTTAGAG 
CTCACTGCAG 
GCTGGTACTT 
GAGATGGCAT 
CCCACTTTGG 
ATAAAAAAGT 
ACCCAGGCTG 
AAGCGATTCT 
CCAGCTAATT 
CAAACTCCTG 
ACACCCGGTC 
AGGGGCTAGG 
TAAGCTACTT 
CTCTCTCTCA 
CTATCACTTT 
CTCATTTGTT 
GGAGAATTTC 
TAGAGAGACC 
ACCAAGATGT 
TTGAATCACT 
CTGCTTTGTT 
GTCTTGGAAG 
GTATCTGTCA 
AGGGCAGCGG 
TATGGTCCAG 
GACTGGTTGA 
TCTCAGAATG 
TTTAGGCACT 
TGGCGCCACT 
AAAAAAAACC 
AGGCCAAGCT 
ATAAGGGAAG 
TATGTTGCTA 
CTGGGAATGT 
TGAAAATATT 
GTACAGTGCA 
AAATAGGTTC 
GTGAGCTGGG 



AAAAGTTCAA 
CGTGATTTGC 
CTCCTCGAAG 
GAATCTGATT 
TTTTAGACAG 
AACATTCTCT 
ATGGAGCAGT 
CCTATGCATC 
GAAGTTTTAA 
CATATCATGA 
ATAGCAACCA 
GATTGGGACA 
GAAGATGATT 
TAAAGACATA 
ATCTAGGTAA 
GAAATCATTT 
ACAGCATCTT 
CCTCAACTTC 
CAGGTGTGCA 
GTTGCTATGT 
CCTCCCAAAG 
ATTTTATTTT 
GAGTGCAATG 
CTTGCCTCAG 
TTTATATTTT 
ACCTCAGGTG 
TAAAAAGTAT 
ATCACCCCAT 
GGTTCCTCTA 
CACACACACA 
CCCAGTCTAG 
TGTTTTGTTT 
CAGGCCAGCC 
TGGGATTCAG 
ACTTTGAAAT 
TTTTTCTGTA 
TATTGGCCCA 
GTGTGGTCTA 
TCTGTCTACC 
CTTTCTATGG 
CTCAGCTGTT 
ACATTAGAAT 
AGTACCACAG 
AGTGCTGTTT 
GCACTCCAAC 
AACTACCCTT 
GTATAATTAA 
TATTGTTGCC 
AAATTCCCCA 
ACCCACTGCC 
GGGCTTGAGA 
GCAAATCAAG 
AAGCAGCAAT 
TAGGATGAGG 



AACCACCACT 
TCATCCTAAA 
GGGCCCAGAC 
CCACCCAGAC 
GCCTCATACC 
ACACACTTTG 
GCTGTAATGA 
TCCAACCTGT 
AACTGAACTA 
AAGATAAAGA 
AATGCAATTT 
GCTGGTAAAA 
ATCTGCAACT 
TAAATAAATA 
AAAGTATATG 
TAAAATAAGA 
ATTCTGTCAC 
CTGGGCTCCA 
CCACTGCACT 
CACCCAGGCT 
TGCTAGAATT 
AATTAACTAA* 
GCATGATGTT 
ACTCCTGAGT 
TAGTAGAGAC 
ATCCACCCAC 
TTTAAAACCA 
GTCTTCTAGG 
TAGGGTCTTG 
CACACACACA 
TACTCATCTC 
GTTTGCTTGG 
CTTTGGCCAT 
TAGTGGGGGG 
AAAACAATAC 
TTTTG TGTAG 
GGGGTATGTT 
AAGCCTAGAC 
TTAGGATGGG 
AAAATATGAA 
TGGAATAAAG 
CACCTGAGTA 
GGTAGGGATA 
AAAACTACGT 
CTGGGCGACA 
GTGATTTGAA 
AGAGCAGTTT 
ATCCACATAC 
GGTGATTCTG 
AATCTCCTGC 
TCCTGAAACA 
GGAATTTTGG 
AAGTTAAAAC 
TCTAAGATTG 
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TGAGGTCAGG 
TACAAAAAAA 
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CTGCACTCCA 
CAATAATAAT 
CTTGAGATTC 
TATGATGAAT 
GACTTCCAAC 
TGTTAGATGG 
CACTTCTAAC 
CACTCCTGCT 
TCTTTAACTA 
TCAAGTGTAT 
TCTGTTTTTT 
AGTTATATAT 
ATTTATTATT 
CTTCAAGCCT 
TTGATTGTGG 
GCACCGGTAG 
TGTCACCCAG 
AATCAATCTG 
ACACCTGCCT 
CTGGTCTGGA 
ATAGGTATAA 
TTTATGTCTG 
TCCATCTATT 
GTGTCTCTCT 
CTTTGGTTGA 
CAGATGTGAG 
GTCTAGGTAG 
GGGTCTCAGC 
GGAGAACTCC 
ATGTCTTGGG 
TTTTGTGACC 
TGATGTGAAG 
GTCTCCTGCT 
GTAGCACTTA 
ACTAAACAGT 
CCCATCATTT 
CACAAGAAAA 
TGTACTATTT 
AAAAATGAAA 
ATTTTTAAAA 
ATTTTTCATG 
GTTTTGAGGA 
ACCGTCATGA 
TTTTGGTTTG 
ATACAGTAAA 
GAAAGTCAGA 
CTCTATCTCT 
GTATGTATAT 
CACTACAGTA 
CTTGTATATA 



CTCATACCTG 
AGTTCAAAAC 
TTAGCTGGGC 
AATCACTTGA 
GCCTGGGTGA 
AATTCAGACA 
AAGTCACACA 
GGAAATTTTT 
AGCAATAACA 
ATAAAGAGAT 
CATCAGATAT 
TAAGACATTT 
CAGGGTTGGT 
GCCATGTACA 
CCTTCATTGA 
AATTGTTACA 
ACCTAAGGTC 
CATGTGGCTG 
AAATTTCCAC 
AAGCTTCTTG 
GGTGGAGTGC 
TTCTCCCACC 
AAAAAACAAA 
ACTCCTGCGC 
GCCACCATAC 
TCTTCCATGG 
GATTAGATAA 
TTATCTTAAA 
TCCTTCTTAA 
TCTATGGGAA 
AAATCAGTCA 
CAAGGTCTTG 
CTTGGAATAT 
AATCTTGGTC 
AGATAGTAAA 
CTTCTGTGGT 
TGGGAACAAA 
CTTTTCAATT 
CTAAATATAA 
AATTTTTTCT 
CCTGGCATAT 
CATGTATTCT 
ATTTTGCATT 
TGTTTCCCGA 
TGACCTCAGT 
AATATAGGAA 
CCTGGTCCTG 
GATGCTTTGT 
TAAATGCTAT 
TTCATCTAAA 
CTCTCTCAAT 
ATCTGTTTCT 
CTAGCATTTT 
TACACACACA 



TAATCCCAGC 
CAGCCTGGCC 
GAGGTGCCAG 
ACTCAGGAGG 
CAGAGCAAGA 
TATCCAGGCA 
TGAAATTTAG 
CAAAGAGGAA 
CAGGATTAAT 
AAAAGTACTC 
AACTAGCAGA 
TAATTACTCT 
CTGGGTGTGC 
GGTATTCTTT 
AGTCAATGGC 
AAACAAATTA 
TGTGGATAGA 
CAATCCAGGT 
TTCCAAGCTC 
GTAGAGGCTG 
AGTGGAGCAA 
TCAGCATCCT 
CAAACGAAAA 
TCAAGCAATT 
CTGGCATATG 
TATTCTAGGT 
AACGTTGTTC 
ATTCTAACCA 
CCTCTTCTTG 
AGCAAGCAAG 
TGGCCCTTCC 
TGGCCTAAGC 
CTTTTTTTGT 
TAGAGCCATT 
TAAGTTCTAT 
TCAGCCCTTA 
AGTCTGGCTT 
AGGAGTGTCC 
AAATCATGTC 
ACTGGGTTAT 
ACATGGATTC 
TTTTCACATC 
TGACTAAATT 
AGTTTTGAGT 
GCACTGCTGT 
CGACAAGATA 
TTGACTTCCC 
GTTTTCCTAA 
TTGTGTGAAA 
AATCCTTGTG 
GTATATTTGG 
ATTCCTGTAT 
TCTAATGTAA 
TACACATACA 



ACTTTGGGAG 
AACATGGTGA 
GCACCTGTAA 
CAGAGGTTGT 
CTCCATTTAA 
TCAAACAGAT 
GTGGAAAATG 
TTTCAGGCTC 
GAGGACTTGG 
TCTCTAAGAA 
CTAAACGGTC 
CAGTAACTCT 
AACACAAGAA 
CATGTACTAT 
TGATATTAGA 
GCAAAAACTT 
AGTTCTGACA 
GTTGGCTGAG 
CCTCAGGTTT 
ATTCAACTTC 
TCATAGCTCA 
GAGTAGCTGG 
AAAACCCCCA 
CTCCTGCCTT 
GCAAGTCTTG 
TATTGTTGAG 
CTTCTGTTAT 
AAGAGCTGCT 
CCCTCTGGGG 
AGGTTCTTCA 
AATGTGGTAC 
CTTATAGAAA 
GAACCTGAGG 
TCAACCTGAT 
GATGTTCACT 
CTTCATCTTC 
CATTCTATGA 
TCACTTCTAT 
CTACTCCTGC 
CTTTAACTTC 
AAGTGTATGC 
TGTTTTTTCC 
TGTCAAATTT 
GAAGTTAGTA 
GCATTTCCAT 
AAGTTCAAGC 
TAGATTTCCC 
AATCAAAATA 
CTTTAAACAA 
GCCAGAATTA 
TGTAGGTATA 
GTGGATGTGC 
TTCAATATTG 
TGCATGTATG 



ACTGAGGTGG 
AAACCCATCT 
TCCCAGCTAC 
AGTGAGCTGA 
AAAAATAATA 
ACCTGGGGCA 
ACATTGGAGA 
TGTTCTTGAG 
GATGTTACAT 
CATGGGACCA 
TAAAAATAAA 
TCAGTTTTTC 
AGCCTGGCAT 
TTCATGTATT 
TTCTACTATT 
AGTGGCTTAA 
TGGCTTAACT 
TCTGAATTCT 
GTTGAAAAAT 
TAGAGGCTGT 
CTGCAGCCTT 
GACCACAAGT 
GAGAACTTTG 
AGCCTAAAAG 
AGCAGGACAA 
ATGGTCCTCT 
TTTTCAACAG 
CTTTTCTTGG 
CCTAAGATGA 
GCCTCCGTTC 
AGACCAGATC 
TAATGAGTGT 
CAACTTTTGG 
TTCTTTTCAT 
CAGAGAAATA 
ATTCCCTCTT 
CCCCCACGTT 
CCATCAGACA 
TGAAAACATT 
AGAGTTGGTC 
CACGTGCATG 
TCTAAAATTT 
AGTCAAATTT 
CTTCAGAAAA 
TTCTGCGTCC 
TCCTGGACAT 
GCTATTTCCT 
GGTTTTTGCC 
TACAAAAAAA 
ACTACCTTAG 
GGGGTGTGTG 
ACAACGCATC 
TTGAAAACAT 
TACATATACA 



GTGGATCACT 
CTACTAAGAA 
TGGGGAGGCT 
GATCGCACCA 
ATAATAATAA 
GATGAATAGT 
AATTTGAGAT 
GGGATAGATG 
AAATTAGAGA 
GAGATAGGCT 
AATCATGCCC 
TACTGTGTTA 
ATACATGGAT 
CTTTTTCACA 
CATGTGTACT 
AGCAACACAC 
GGGTTCCCTG 
CATCAGAGGC 
TCAGTTCTTT 
CTGCAGTTCC 
GACCTCCCAG 
GTGTGCCATC 
TAGAGACAAG 
TTCTGGGATT 
ATACAGATG A 
ATTGTCTTGT 
TAGCTTTTAT 
TGTACTTTAC 
GGGCTGTTAT 
AGCCTTAAAT 
ACAGAGACAG 
TTACTTACTT 
TGATTTCTTG 
GTCAGTGGCA 
CAATGACTTA 
ATCTGCATCT 
GAGTTTCTTA 
T.AACTAGCCG 
TTAATTACTC 
TTGTGTGCAA 
TATTCCTTCA 
ATTTCCTTTT 
GTTTAAAACC 
ACTGTTTTGT 
ACACACATTT 
TGCATAAAAG 
AAGTTGAGAT 
TTTTATGATT 
ACCTAAGGAA 
TTATTATTTT 
TAGTGTGTGT 
CTGCTTTGTA 
TTTAAAAAAG 
CATACAGACA 
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200881 AAAATGTATC CTATGTATAT TCACACATGT ATACACACTC ACACGTACAT AGAGTTTTAC 

200941 AT CCATA GTT TATAAATGTT GCTTTTTTTT GGTCACCTTT TTGCTAAGTC TTACACTTTT 

201001 TTTTTTTTTT TTGAGACGGA GTTTTGTTGT CATTGCCCAG GCTTAGTGCA GTAGCGCGAT 

201061 CTCACCTCAC TGCAACCTCG ACCTCCCGGG TTCAAGCGGT TCTCCTGCCT TAGCCTCCTG 

201121 AGTAGCTGGT ACTACAGGTG TGCGCCACCA TGCCTGGCTA ATTTTTGTAG TTTTTTTATA 

201181 GAGACGAGGT TTCACCATGT TGGCCAAGCT GGTCTGGAAC TCCTGACCTC AAGTGATCTG 

201241 CCTGCCTCAG ATTCCCAAAG TGCTGGGATT ACAGATGTGA GCCACTGCAC CCGGCCAAGT 

201301 CTTACACATC TTTTTTTTAC CACTAAACTG TTTACCCAAA CCTGATAACC CAAGTCAACA 

201361 GCTATTATGG CTCACACAAT CTTATGTAAA CAAAGATACA GATATATAGA ATTTTCTTGA 

201421 TTAATATTCA GAAAAAAATG GAGTCCCTTT ATACGTCCTT AGTATCTGCT TTACTCATTT 

201481 AAAAATGTAT TACATTATAT GAAAGTATTC AGGTCAAATG TTATAGATGT GATTCATTCT 

201541 TTTTAACTGT GTTATTTTTC TGCAATGACT ATGTATCACA AAGTACTCAG TCTTCCACTG 

201601 ATGAAAATTT GGGCTATTTC CAGTTTGTCT TCCATTTTTC TTTCTTCCTC TTGGATTTTC 

201661 ACTCAATGTG TTTACTAATT TAGGAAGAAT CAATAGTTTT TATGGTATTA CTTCTCCCAT 

201721 TCAAGAATAT AGCATATGGT ATAGTATAGT AGAGTACTTA GTTTAATTTA GCCAGATCCT 

201781 GTTTTCTGCC CTTTAATAAA ATTCTATCAT TTTCTGCCTT TGAGTCACAT TTTCCTTGTT 

201841 CATATAATTC TTAAAAAATG TATAGTTTTC ATTCTAAGGG AACATAAAAA CTTCTTTCCA 

201901 TTTCTATTCC TGTCTAGTTA ATTCTACTAT TGGGAAAAGT AACTGTTAAA AAAAATTCTT 

201961 ATCTTTCCAG TCAGTTCACC ACATTTCCTT TATACCTTTG TACTTTAATC CCCAGTCATG 

202021 TTGAACACTT CTTATTCCTC ACACCAAGCC TCAACGGGTT TGCTCTTTCT GGAAGGTGCT 

202081 TCCCCTGTAT TACTGACTTA TTCATACCAC ACATGGAGAC TGGCGCAGCC CTGTTCTGCC 

202141 TGGGAAGCCT TCCCCTGATA CCCCTAGTTG GCAGGAGTCT TCATTTGTTC TTTTCTAGTC 

202201 ACCTGTGCAA GTTTGTATTG TTCATGTTTA TCATCCTTCA TTCTAGTTGT CTGTCTCTAT 

202261 GTGTGGTCTC ATTCAGTGGA CTCTGAACTC TTATGAAGTC ATGTCATGGG TCAGATCTTA 

202321 ATAAATTAAT ATTGTCGGAA GCTAATGTCA TGTCTAGAAT ACAGAAAATT TATCAAAAAA 

202381 AAATATAGTA TGTTGGCTGG GCGCAGTGGA TCAAGCCCGT AATCCCAGCA CTTTGGGAGG 

202441 CCGAGGCAGG AGGATCACAT GAGGTCAGAA ATTCAAGACC AGCCTGGCCA AAATGGTGAA 

202501 ACCTCATCTC TACTAAAAAT ACAAAAAGTA GCCAGGCGTG GTGGTGCCCA CCTGTAATCC 

202561 CAGCTACTCA GGAGGCTGAA GCGGGAGGAT CACTTGAACC TGGGAGGCAG AGATTGCAAT 

202621 GAGCTGAGAT CATGCCACTG CACTCCAGCC TGGGCGACAG TGAGACTCCA ACTCAAAATA 

202681 ATAGTAATAA TAATAATAAT AATTGTATGG AATTGAACTG CTCTGATTGG AAATAGCTGT 

202 741 TTTTTAAAAA ATTATTATTT TTTAAGTTCC TGGGTACATG TACAGGATGT GCAGGTTTGT 

202 801 TACATAGGTA AACGTGTGCC ATGGTGATTT GCTGCACCTA TCAACCCATC ACCTAGGTAT 
202861 TAAGTACAGC ATGCATTAGC TCTTTTACCT AATGTTCTCC CACACCCCCA CCCCATCCTC 
202921 CCCCAACAGG CCCCAGTGAG TGTTGTTC CC CTCCCTGTGT CCACGTGTTC TCATTGTTCA 
202981 GCTCCCACTC ATAAGTGAGA ACATGAGGTG TTTGGTTTTC TGTTCCTGCC TTAGCTGTTA 
203041 ATGTCAGGCC AGAGAGGCTT AAATTTTTAA GGATCTCTGG ACTTTTCTTC TACATTACTC 
2 03101 TTGATGTTTA TAAATGTTAC AACTTCTTTA ATTTCATTAA ATGTATACCT TATTGAGTTG 
203161 ATTTAACTGA GTTAACTTTG TTATATGAAA ATCATGATTG GGAGTGAGGG GGTTAAACCA 
203221 GCTACAGAGA TCTTGATTGT TGGTGGTGAA GCAATGCAAG AATTCATTCA TTCAGTAAAC 
203281 TAATGTTTAT TAAGCGTGTA CTGTCTTAGT CTGTTCAGAC TGCTGTAACA AAATATCATA 
203341 AACTGGGTGA CTTATAAACA ACAAAAAATT TATTTCTTAC AGTTCTGGAG GTGGGAAGTC 
203401 TAAGATTAAG GCCCTGGCAA ATTTAGTGTC TGGTGAGGAC AGGTAGCCAT CTTTTTGCTG 
203461 AGTCCTAACA TGGCAGAAGG GTTGAATAAA CTTCCTTGGG TTTCTTTTAT AAGGACACTA 
203521 ATCCTAGTGA TGAGGTTTCT GCCCTCATGG TATAACTACT GCCCAAAGAC CCCTCCTTCT ' 

203 581 AATATTATCA CTTTGTGGGT TAGGATTTCA ACATGAGTTT TGAGAGGATA CAGACATTTG 
203641 GATCATAGCA CACACCATAG GACAGACACT GTGCCAAGAA TTGTGGATAT AGTGATTCTC 
203701 AAAATGAACA AGATCCCCTC AGAGAGCTTG CAAAATCCAG CTATAAAATT ATGCTTTTTA 
203761 AACAAATTAT GCAGTTTGAA AAATCTACTC TGAATCTTAC TTGTGGCATT GAATACTTTC 
203821 GGCCACTCTT TCCTTATTAT ATTAAATATT TACTCTTGTT TGGGGGATCC AGTCTCACCT 
203881 ACTTT TTCTA CCAGAACTGG TATCAGCTCA TGCTCTGCCT TATGCAAATT AAGAAAATAT 
203941 CATACCTTTT GGGTAAATTA AGCCAAGAAA GTTCTCCTTT CTTCTCTTTC TCTCTTTCTT 
204001 TCTTTCTCTC TTTCTCTTTC TTTCTTTCTC TCTCTTTCTT TCTTTCTTTC TTTCTTTCTT 
204061 TCTTTCTTTC TTTCTTTCTT TCTTTCTTTC TTTTTCTTTC TTTCTTTCTT TCTTTCTTTC 
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204121 TTTTTCTTTC TGACAGGGTC TTGCTCTATT GCCTAGGCTG GAGTGCAGTG GTGCAATCTC 

204181 AGCTCACTGC AGCCTTGAAC TCCAGGGCTC AAGCAATCCT CCTGAGTAGC TGGGACTATA 

204241 GGCATGTGCC ACAACATCAA GCTAATTTTT GCATTTTTTT GTGGAGACGG GATCTCCCTA 

2043 01 TGTTGCTAAG GCTGGTCTTG GATTCCTGGG CTTATGCGAT TCTCCTGCCT CAGCCTCCCA 

204361 AAGTCCTGGG ATTACAGGCA TGAGCCACTG CCCCTGGCCA TTATAACTAT TTTCATTGGC 

204421 TTATCAGGCA CATGATAACT ATAATAAATC AATAACCAGA ATTTTTAAAT AAAGAAAGGA 

204481 AGGAATTGTT TCAACTCTTC CTGCTACCCC TCTATCCCTC AAAAGGGTAG GCTGAATGTT 

204541 GTCCTCCAAA GATATCCATG TCCTAATCCC CAGAACCTGT AAATATATTA CCTTATATGA 

204601 CAAAAGGGAC TTTACATGTT TAATAAGTTA AGAATTTTGA GATGGGCAGA TTTTCCTGAA 

204661 TTTTGCAGAT GGGCCCTAGT GTAATCACAA GGGTCCTTAT AAGAGACAGG CAGAAGAGTC 

204721 AGAATAAGAG AAAAATACTT CAAGATGTTA CACTGCTGGC TTTAAGGTGG AGGAAAGGCC 

2047 81 AAGAGCCAAA AAATGCAGTG GTCACTACAA GCTGAAAAGA AAAAGAAATG GATTTTCCCC 

204841 TAAAGCCTCT GGAGGGGGCA CAACCTTGCC AATACCTTGA TTTTGGCTCA GTGAAACCCA 

204901 TTTTGGACTT CTGACCTTTA GAACTGTAAA TAAATAAATA ATTTTGTGTT GTTTCAAGCC 

204961 ATCACAGTTG TGGTAATTTA CTACAACAGC AATAAAATAG AATTAAATAC AGAGATCTGA 

205021 GGAGTTGAGT AGGATAAGCC TACTCCAGCA GGTTATTTCG GGAGTATGGT GAGACTCACT 

2 05081 AGGATGGCGG AACTCAATTA AGGAAGTCTG AAGCTGATAA GCCAGAGAGG GAAGGCTCTC 

205141 ACTTCATTTT ATAAGGGTTG CGTCACACTA GGAAGATCCA ATAGCAACCA CAGTCTCAAA 

205201 ATTAATGATT ACAAATAGGA CACAATTCCA AGAGTCGGGA GCCAAGCAGA AAATGGATTA 

205261 GGGAAGACAT GGATGATATG AAACAGGAAG GAGGGGTACA AGGCAGCTTC CTGGGAAGTT 

205321 GCCAGGGCAG TCACAGTTCA CATTCATTAG GCTGTGGGCA CCAAATGCAT ATGGAAAATC 

205381 TAGCTGACTT AACTGAACTC CTGAAGAGGA ATGAACACCT CATTTATTGA GGAGCTACTA 

205441 CCAATTAGAA TATGTATTTC ATTTGTTCAA TAACCCCATG AGTACAGTAA CACAATCCTT 

205501 GCTTTACTAA AGCGGAAGCC AATTCAAAGA GGTTCAGTGA CTTGTCCAAG CTCAGGGAAA 

205561 ACACTAGGAA GTGAATATGG GTCTGACTCC ATCACTGATT TCAGGAGCCC TGCCCTTTCC 

205621 TCCACACCAT GCCCCCTTGC TTTCAGAAAA AAAGGCTTGT TGACTGAATG GTTGTATGCA 

205681 CAGTTCAAAG CAGAAACACA CGATGACATC TTTTGAGATA CTCTAACAGT GAGAACTTGA 

205741 AAATGAAGTT AAAAATTAAG CGGCAAAACC AAGCCGAGGC TTTCTGAGAA AGTGGGGCCA 

205801 AACCTGTTGC CGTCTGACTG CCACGTGGCT CACTATTTAT CCCTGTAAAA ATCTGCAAAA 

205861 GTATTTGAAA GGGAAGAAGG GACAGAAAAC TCCCTCCTTT TCCAAGTTAG CCTTATAGTC 

205921 TAGGGCTTAA AATACTGGTT TAATGGTGAA GGTAAGTGCT TTTCrrCT'l T TTGGGTAGAA 

205981 GGATTATTAC TAACTTACCA AAGGTCCATT AAGGGGAGGG AACAGTTTTA GGAGAAGTCA 

206041 GAGAAAAGAC ATTAACAGCA ACATAAGGAT CTCCATCTGG TAATATTGCC TAATTCCAAA 

206101 ATGAAGAGAC TCTCTGAAAA AGATAACTGA TTCAATGAAG ACCCTAGGGC AAGGCTTGAG 

206161 AAGCCACTGG TACCAATGGA CACTGTGGAC AATGGTCATT TCTCCAAGGA CGCTGTGAGT 

206221 ATTAACTGTG ATGCTGTGAT TAGTCAGACT GGGATTGGCT GTGGAATGAA ATACTGATCA 

206281 GAACTGACAA GATTTGTGTT TGGGACTGTG GCTAACGAGT CTTTTCAGAC TTCTATATGA 

206341 ATTTGAAATG GTCTCTCAGG AAAAGGAGAA CATGGCCGGG CCTGGTGGCT CACGCCTGTA 

206401 ATCCCAGCAC TTTGGCAGGC TGAGGCGGGC AGATCACTTG AGGTCAGGAG TTTGAGACCA 

206461 GCCTGGCCAA CATGGTGAAA CCCTGTCTCC ACTAAAAATA CAAAAATTAG CAGGGCGTAG 

206521 CGGCGCGTGC ACCTATGCGC ATGCATAGTG CGCGTGCCAG CTATTCAGAA GGCTGAGGCA 

206581 GGAGAATTGC TTGAACCCAG GATGTAGAGG TTGCAGTAGT TGAGATCATA CCACTGCACT 

206641 CCAGCCTAGG TGACAGAGTA AGACTCTGTC TCAAAAAAAT AATAATAATA AAAGAAAAGG 

206701 AGAACATGAC CAAAGTTATG AATAAGACTG AAGGCAAGAA AATTGTACGC TTGTAGAGAT 

206761 CACCTAGCTT GTTGCCCTCA TTGTACAGCT AAGAAAAGGC ACCCAGGGAC ATTGTGGTCA 

206821 GCACCAATTT CTCAGAAAGA TAGGCAGATG ATGAGAGGGC CCTCAGTTTT TCTAACACTG 

206881 AAGGAATTGC TTCTATGTTT TCTGGTGAAC TCCTCCCCAC TCATCTTGAG GATTCCAGGC 

206941 CAGAAGAATC CACTTTAAAA AAGAAACATT TAAAACCAAT TTAACAACCA ATCAAAGGCA 

207001 CTTTTATAGA AATACATTTC ATTTGCTGTT GGCCTGTATT TATGGATCTG AGAGGGCTAG 

207061 ACTGCCAATA TTGTGACTGT TTATTATTAT TGCTGTTGCT AGTATCTAGA ATATTATACA 

207121 ACATATAACA CTTTGCAATT TACGAGGCAT GTCTCATACT TTTGTTTTCA CTCCAAACTG 

207181 CCCAGTGAAG TAACATTATC CCAATTCTTC CTATGAAACA GTGAAAGCCC TAAGAGTTTT 

207241 TGAAACTTTA CCTGGTTTAC TCAATTTGGG AATGGCAGAG CAGAATTCAG TCCTTGAATA 

207301 TCCTCCCACT GCAGGTTCAT GCTCTTTGAT CTAGGTGTAA CATTTACTCT GAGTAAACTA 
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GGACTCTGGG 

ATCTAACGAC 

TTTTTTGCTT 

TATACGGTGT 

TTTTTCTGTG 

TCTCCTTAAA 

AATAATTATT 

TTTATTAGCT 

TTAGGATGGT 

AAACGAAGAA 

CTTAAGTAGG 

AAAGTGATGG 

TGTGTTTATG 

TCTAGAATAA 

AAATGGAGCT 

AAGTTGTAAT 

TGCATCATGT 

ATAAACAGTT 

GAGAAGTTGG 

AAGAGTTGCC 

GGTCCAAACT 

AGCTGAACTC 

AACCAGTATC 

CATTGTACAA 

TTAGATGGAG 

TTGCCATGCT 

TAAAATGAGT 

AATGAAGTGA 

CAGTTTCTAT 

AGCATTTTTC 
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CTAACAGAGA 
CATTATAATA 
ATGCGTATAC 
CAAGTAATTT 
TCTACTTACA 
GGGAAGGGTT 
CCAGTGTCTC 
TCTGTGCTTA 
TTGGTATGTT 
CTGAAATTAC 
GCTTTTCATC 
GTTTGTGATA 
TTTAATATTC 
ATGATTAAAA 
ACCCCATTGA 
AGGTAGAACA 
GGTTTCAGGC 
GGGCCAGAGG 
TGGGAAAGCT 
TTCAGCCAAG 
CTGGGTTTGA 
CTGATATCCA 
TGTCCTGGTG 
AACAACAACA 
AGATACTATT 
GATGAAGTCC 
ATCTACTAAT 
TCATCCTGTT 
TCCTGTATGT 
TAATGTAATT 
AATTGACTTG 
GTAATACTAC 
TTCATCCTAA 
GTGTGCAAAA 
TTGTTTGTTT 
GATCTTGGCT 
CTTAGTAGCA 
GAGATGGGGT 
CCCACCTCAG 
CATGTGTTTT 
AATAAAGTCA 
ATGGATTTTT 
AAATATTAAA 
GCTGGATTTA 
TGAAGACTTT 
ATTTTAAACT 
AGGCACATGG 
GAAAGACCGC 
ATGGTCTGCT 
GTACATGCAA 
TGAACAGATG 
TGGAACTTAC 
CCCATAAAAT 
GGCAACATTA 



TGAAGCAAGA 
AAATCATGAG 
CATAATATTT 
TTTTTAATAT 
ACTTTGGCAC 
CTGACACTGT 
TAAGTACATA 
TTTTGGAAAA 
AGCCTGATTT 
CTATTGATAC 
CTTTCTCGTT 
CAATTCCAGT 
AAAGCTCAAC 
CTTGATTTAA 
GTTTTAAGCT 
AGCAGTAGTC 
AACTTTTCAA 
ATCTCTGAGT 
TTAAGTGGAG 
CCACGGGATC 
CCACAGATGA 
GATGTTAGCA 
CTGACCTGAT 
ACAACAACAA 
CCCAGAATTC 
AATTATTGCT 
TATTTACAAA 
TTGTAACCCA 
GGATGTGCAC 
CAATATTGTC 
CCAGACTCTC 
AAAGGATATT 
GGTCACAGAT 
ACAGTGCAAA 
GTTTTGAGAC 
CACTGCAACC 
GGGTCTACAG 
TTCACCATGT 
TATCCCAAAG 
TAAAGTCACA 
TAGAAGCTTC 
CCTAAAAGAA 
TTAAACATGT 
TTCACAATTG 
GTCAGTCCAA 
TTTAAATGTA 
AACATTGTTC 
TCTGGAACCT 
ACAAGCAATA 
TTTTTCATTT 
AGGAAATGAA 
AGCCAGATTT 
GTAAGTTATA 
ACAAGGGGAA 



CAGGCTGGAT 
TTCTAGACTT 
ACATTATTTA 
AACATTTTCC 
TAGAATTCAC 
TACATGTTCT 
TCAACCATGC 
ACATTTCCCA 
CTGCATTCGT 
AAAATCAAAG 
AGACAGCAAC 
AACATAAAGA 
CTAAAAGTAT 
AATATACAAA 
TGTGATTAAA 
TAGGCATTAG 
ATTTTCTACG 
CTCTTTCAGC 
TGTAAGTAAT 
TTGCATAAAA 
CTTCAGCTAG 
AGACTTGGAG 
CTTACTAGCA 
TAAAATCTCC 
TAGAGATATT 
CTTTTAAATA 
ATCACTTGGT 
GAAATAGTCA 
AGCGTATCCT 
GAAAACATTT 
ATTATTAGGT 
TTTGGACACA 
TATGAATATC 
GCCTTGAATG 
GGATTCCTGC 
TTTGCCTCTT 
GCATGTGCCA 
TGGCCAGGAT 
TGCTGGGATT 
GAAATTTCAG 
AATTTAGGAA 
ACAAATGTAT 
CCATATTTAG 
TAGTAATTAG 
GCAAGTGTCC 
ATACATATTA 
TGGTGGTACA 
TCCTCCTTAG 
CCACTCTTCA 
AATTCTTCCA 
TGATTAGAGA 
CCTTTTAACA 
GAGCTGTGTT 
ATTATTTGTG 



ATTAGGAGAA 
AAAAAAAGGG 
TTTTTTTCTC 
TTTAACTTAA 
AATTTTTTTT 
CAATTGTTTG 
CAGTGTTCAG 
TTACCATGAA 
CTCATGCAAA 
TAGCATTTGA 
AGAGAATGGG 
GCAAGGAGAA 
TTTTCATTAT 
TTCTCCTTTA 
ATATTACGAA 
GGGATCTGGT 
CAAATTTTCT 
TTTCAGTGTT 
TGCAGCTGCA 
AGTGAAATCA 
GATCTGAGTG 
GCCTTCTAAG 
ATTGGGCCTC 
AAACACCCAA 
TGGAAAGCAG 
CATTTAGCTA 
AAATATAGAA 
TTACTGGCAC 
GCTTTGTACA 
TAAAATAGCT 
TAATTTATCT 
ATTTTTCATC 
TTTAAAGTAC 
ATAAAATAGA 
TCTGTCCCCC 
GGGTTCAAGC 
CCACACCCGG 
GATCTCGAAC 
ACAGGTGTGA 
ATGTCTTGAA 
TGAATGGAAA 
GCATCCCCAA 
AGCCATGAAT 
TCCCTGTTCA 
ACATTGTGTG 
GTGTTATGTA 
GAGGGGAGAG 
CTCTTGAGCT 
CCTTCGCATG 
GCTGCACTAA 
ATTTAAATGA 
ATCCTGTAAC 
GGGTCAAAAC 
TATTATGTTT 



TCTAAGAGCA 
AAAAACCTGT 
AAATTCAACC 
TTTCAATTCA 
TAGAGGTATA 
CAAATAGGTT 
CCTCCATAAT 
AGACCTCAGT 
GGAAAATAGG 
AACCATAAAA 
AAGAAAAACT 
GTAGTTTTGT 
CAAACTTCCT 
TAATACCTCA 
AACAAAGGGG 
GCTGGCTCTG 
TATCAATAAA 
TATAAGATTG 
TGTACAGTTA 
AATAGAAAAT 
TAGAGCAATG 
GCAGAGCAAC 
CATTTGGGTC 
AATTCAAAAT 
AAAACTATAC 
CTTCTGAATA 
AGTCACAAAG 
TTGTGTGAAT 
CTAGAGTACT 
TCCATCACAA 
CTAACATTAT 
TATGCCTTTC 
GGACAAGTCT 
GGTTTGATAT 
AAGCTGTAGT 
AATTATCCTG 
CTGTTTTTGT 
ACCTGACCTC 
GCCACTGCAC 
GGATTTTAAG 
ATTGATGATA 
AGATAATTTG 
TCTCTTTGCC 
TTATAATTTT 
TAGCAAACAT 
ATGTCATCCT 
AAACACCATC 
TAGTTTAATT 
CTTCTCTGTG 
GAAAGGAGCC 
CTAGCTCTAG 
CAAAAGCATA 
TTTTACTGAT 
TGGATTATGT 
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TCTCTCCATA 
CACAAAGCGT 
AGGTAAAGCC 
ATTTACACAT 
GCCAGCTCCA 
AATTAAGCCA 
TCTTTTTTTT 
GCGATCTCGG 
TCCCGAGTAG 
AGTAGAGATG 
CACCGGCCTC 
GGAAAGTCAT 
CTGGCTCTTT 
CACCACTCTT 
ATCTGTGTCT 
GAAAAATCCA 
TTTTATGGGG 
CACCACATCA 
AGATTGAGTC 
AACCTCTGCC 
ACAGGCATGC 
' CATGTTGACC 
CAAAATGCTG 
AGTTGAACAT 
ATTACACTAG 
AGAGAATCCT 
AAGCTTTGTG 
TTTTATTGAC 
ATTTTGGAGC 
TCCTATCCCC 
TGTCTTGAAG 
AAAAAAAGAC 
AAAGGAAAAG 
GGTCCAGATT 
ACCATGATAA 
CAGCAGCAAG 
TCCAGCATAT 
TGTTGAGAGA 
ATACATTTCC 
CTTCAATAAT 
AGAGTTAAAT 
TAGGTATTAC 
TTATTGTTTC 
CGATTCTTGT 
TCTCCACTCA 
TGAGATAGAG 
GCAGCCTCCG 
GTAAGGGGGC 
ACTAGGCTGG 
TTGGGATTAC 
GCATTGCTTC 
CATCTTACTT 
TTTACATTTA 
TTGTACTGTA 



GATAAAAGAC 
GGTACCATTT 
ACTGCTCTTG 
CTCTGCATCA 
GCCCCTGATC 
AATAAGCAAT 
TTTTTTTTTG 
CTCACTGCAA 
CTTGGACTAC 
GAGTTTCGCC 
GGCCTCCCAA 
TTTAAACCAA 
CTCCTGAGCT 
ATCTGTGAGC 
TCACAGGTTT 
ATCTATCATG 
ATGCTTTTAA 
CCTGCAAGCT 
TCATTCTGTC 
TCCTGGGTTC 
ATCACCATGC 
AGGCTGGTCT 
GGACTACAGG 
ATGTGAAGGC 
GGAATTAGTC 
TGGATGTGCA 
ATAAACAAAT 
GCTGAGAAGG 
AATATGACAT 
TTGAAAGATG 
CCAACCAAAT 
AATGAGACTT 
AAAGGGGTCT 
TCTGTTCATT 
CGCAGCGTGT 
GTCTATCTAA 
CCATCAAGGA 
AAAAACTTTG 
AATGACAAAT 
AAAAATAAGA 
GTGAAAAATT 
CTGGGCACAT 
TGAGCAATTT 
CCATAGCTTT 
CCTCCCAGTT 
TCTTCCTCTG 
CCTCCCGGGT 
ATGCCACCGC 
TCTCGAACTC 
AGGTGTGAGC 
CTGCTTGTGT 
ACTTCCTCCA 
TATGAAAACC 
CATTTCCCAT 



TGTCGTAGTA 
CCCACAGAAG 
TTTGCAGGCT 
CACTGACCCT 
CTGTTGCTTT 
AAATCCTGGG 
ACTGAGTCTT 
CCTCTGCCTC 
AGGCACACAC 
GTGTTAGCCA 
AGTGCTGGGA 
CCTATGTATG 
TGGAAACCTC 
TTTTTTGGCC 
TCTCTTTCTT 
CACATGGGAA 
AGAAAAAATT 
TTGTAAAAAT 
ACCCAGGCTG 
AAGTGATTCT 
CTGGGTAATT 
CAAACTCCTG 
CGTGAGCCAC 
AGGACCTAGT 
AAAGTGCTCA 
ATACCTTAAT 
GTGCATAACA 
TTATGTGACT 
AAATGCCTTA 
GCCATATTTG 
AATTTGACAA 
CATGTGTCAT 
CAGTCAGGAT 
ACGCTATGGG 
GAGTCTGAGC 
TGCCTCCACT 
ATTTGATACA 
AAAGGAAGGC 
TAAAACTGAC 
TTTCATTGAG 
TAAAAATGGA 
TCTTATAGGT 
TATATCCCTG 
GCAAATAAAT 
GAATTAGCCA 
TCATTCAGGC 
TCAAGAGATT 
GGCTGGCTAA 
CTGACCTCAG 
CACTGTGCCA 
TATGCGTGAT 
TTAATCAATG 
ATGAATTTAC 
GTCATCCCTA 



AAAGAGATTC 
CTAAATGGAC 
ATGTTAATAA 
TCGTAAAGAT 
TTCCTTAGCC 
ATCTAGGGAG 
GCTCTGTCTC 
CCGGGTTCAA 
CACCATGCCC 
GGATGGTCTC 
TTACAGGCAT 
AATCCCTACT 
CAGTAAAATG 
ATTAAAAATT 
TCACTTTAGT 
CCCTTTCAAT 
TGTCCTTTCA 
AGTTCTACAT 
GAGTACAGTG 
CCTGACTCAG 
TTTGTATTTT 
ACCTCAAGTG 
TGCACCCCAC 
GACACATAGC 
TTTAAAGTAC 
TCAAAGGCAG 
GATGGGACTA 
GGCTCTGCCA 
CATGTGGGTT 
CTTTACTTGG 
AGTGGGTTTG 
CCAAAGTTCT 
GCTCACTGCA 
CTGGCTCTTA 
ATTGCGATCA 
GAGGGGCCTG 
AAGGTAAGTA 
ATAGATCTTG 
TGGAACTATT 
GTTATTATGA 
ACAGTTTATG 
TACTCAATCC 
TAAATTCTAT 
TTTGCCAAGA 
ATTTTGCTGT 
TGGAGTGCAG 
TTCCTGTCTC 
TTTTTGTATT 
GTGATCCACC 
GGCTCTGCTG 
TCTTTGAGTT 
AGTTAAATAA 
CCAATTAAAA 
TAATTCATGA 



AGGGCACAGG 
GGGAAGCCTG 
GCTGAAGCTT 
ACTCCCAGTG 
CCATGAAATC 
TGGAATAAGT 
ACAGGCTGGA 
GTGATTCTCC 
AGCTGAATTT 
GATCTCCTGA 
GGGCCACCAC 
ATAATATTCT 
GAAATAATTA 
ATTTCTTCCA 
GCTTTTCTTC 
ATTGGTCTGT 
ATATATTGAA 
ATTAATTTTT 
ACATGATCTT 
CCTCCCGAGT 
TAGTAGAGAT 
ATCCACCTGC 
GTAGTTTTTT 
AATAACATTT 
CATCTCTCAA 
CTCGTTATGT 
TTGACTTACA 
CTGTCATCCC 
TTCTCTATTT 
TTATAAGATC 
TAGTGCTGGC 
ATCAGATCGA 
TACATCTGTG 
TCATGCACTT 
TCGCCATGGT 
TTGCAGATGC 
TGATGGAAAA 
ATTCTGTGGA 
TTTCTTTGAG 
TTATAAGGTG 
TGATGTCTTC 
TATTCAGTTC 
ATAACCAATA 
GAAAAATCAG 
TTGTTTGTTT 
TGGCATGATC 
AGCCTCCCAA 
TTTAGTAGAG 
CGCCTCGGCC 
TATATTTAAA 
TTCCTTTGAA 
AATCTTTGTT 
AAATTATCCT 
TTAATGATTT 



GAAACTCCAC 
CCACCAGGAA 
ATTCCGACAC 
TAACATTGGA 
ATCTGCGAGA 
TTTGGGAAAG 
GTGCAGTGGT 
TGCCTCAGCC 
TTGTATTTTT 
CCTCGTGATC 
GCCTGGCCCG 
CACCAAGCGG 
TTTCCCAGAC 
TTATATTTTT 
AAATAAGCAG 
GGTTGTTCCA 
TATCTTCCAG 
TTTTTTTTTG 
GGCTCATTGC 
AGCTGGGATT 
GGGGTTTCAC 
CTTAGCCTCC 

CCAAGTAGAC 
ATGTATTAAA 
ATAAACTCTC 
GCCCAGGGAA 
CATTCACTTC 
ATCATGTGTT 
CCATATTCGC 
TATTTTGGTG 
GCTGTGAGAG 
TTGTTGTCTA 
CTCAAACTTC 
GAACACCACT 
CTTCAATAAC 
TAGGGCTCTT 
GTATGGAAGT 
ACATTGCTTA 
GGGGAACTGT 
AATGAAAAAC 
TCTGCCTGTT 
GAAATGCAAA 
TTAAAACTTT 
GTTTGTTTTT 
TCAGCTCACT 
GTAGCTGGGA 
ACAGGGTTTC 
TCCCAAAGTG 
GTCTATTTCA 
CCAGTTATAA 
GTATGTTTAT 
TTAAATTATC 
TATTACATTG 
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213841 GACCTAGCTT ATTTACAATG AGTACATAAA TTTATTGTCT CCAGTCTTTC CTCCATTATC 

213 901 CCGTCTACAT ATCCACACTG AGTAGATTCA CTACTCAGGA ATCTTGGACA CCTTCAAGTT 
213961 GCCAAACATG CAGTGTTCAC TGGACATGCT GTGTTCCTTC AGAATTTGGG CCTGCTTCTC 

214 021 AGCACACTCA CATCTGCTAT CAATGACCCA TGGAAAGTTT TTGCCCTGAG CAAGCCAGAG 
214 081 TCCCTGTTAG TTTCTTCCAA ATGCTACAAG TTCACTTTTG CTATTTTTTC CGATGAGATA 
214141 AAATTTTCCT TTTTGACTTT CTACAAATCA TAGTCATTTT TCAAGGGATA GTTCAAGTAT 
214201 TGCTTCCTTT CTGGGACCTT CCCAAATTAT TATTTTCTCC TCTCAAAGTC TCTGTTTTAT 
214261 TTATGTTCAT CCTCAAATCT TGATTCTCAC ATGAATCATA TACCTTGTAT TATTTATAGT 
214321 TTTTTTGAGT AGGTAAAATA TTTCATATTT TATATTCTTT GGCTCTCTAC TTTATAGCAT 
2143 81 GATGCCAGAT ATTTAGGGGC CTTACTGCAT TTATTTTTTA TTTTATTTTA AAATCTATTT 
214441 TATTTTTTAT TTATTTATTT TAAAATCTAT TTATTTTTAG GTAAATATTC AGGTAATATA 
214501 ATTTATGTAA TTATTTAGGA ATTTTAGGTA GTTATTTTAA AATAATTCAA ATTATTTATT 
214561 GAGTTATATC AGAAGAATGT GATCTTATTC ATTTGTAATA TGTGTTTTAG GAACTCAGTT 
214621 CAGCCAGGGC AGACCATAAT TCCCAAACTT GACTTTTCTT TTTAATTAGG CACTGATTTT 
214681 GGTTAAGAGT TCAGTAAAGT TTTGTGTGTG TGTTTTAAAA AATTCTTTGA TATAAGAGTC 
214741 AAGATGTTAC TCAACTTTTA CTAGAAGCAA AATAGAGGAA GTGCTTTCAC AGATGAAATA 
214801 TCTCTCAATG TTTTCTTCCA TTTACTTCTT CCTATTATTC ATCTATATAA TCATTTTCTT 
214 861 TACCTCTTTT CTTCATTTCT TCTGTTTTTC TCTCCTACTA AGACAAGCAA ATTAGGGGTA 
214921 TAATTGGTTA TTTGGGAAGG TAGGAAGAAT ACAGAGAGAA ACAAAAATCA ATATTTTATA 
214 981 CTAGGGTCTC ACTAACCTCA AGCAACTCTG ACTGTAAAGT AGATTTTCAT AATAGGACTT 
215041 CTTGACAAAG AGTTTTCCTA TTTTTCCCCC AGGCCTCTGT GTATCAATGG AGCCCAGAAA 
215101 CTCAGGGTAT CATCTTTAGC TCCATCAACT ATGGGATAAT ACTGACTCTG ATCCCAAGTG 
215161 GATATTTAGC AGGGATATTT GGAGCAAAAA AAATGCTTGG TGCTGGTTTG CTGATCTCTT 
215221 CCCTTCTCAC CCTCTTTACA CCACTGGCTG CTGACTTCGG AGTGATTTTG GTCATCATGG 
215281 TTCGGACAGT CCAGGGCATG GCCCAGGTAT CCAGATACTT TCTCATTCTT GGTGGGATCC 
215341 AGATTTCTGA ATTCTACAAA ATATCAAAGG TCTTAATGAT TTTCATTTCA GGGAATGGCA 
215401 TGGACAGGTC AGTTTACTAT TTGGGCAAAG TGGGCTCCTC CACTTGAACG AAGCAAGCTC 
215461 ACCACCATTG CAGGATCAGG TAAGTGTGCA CAGATGGGTC ATAGCTTTGT CATCTGTTCC 
215521 ATCCCACTGT GTCTTATCTT CTATGAATCA AATGGTTTGG GGAAGAGAGA GAAAAAGTAC 
215581 TGCTGAAAAA TTCAACAATA TAAGACACTT GCATCACAAA TAGGAAAGAT GCATCTGTGC 
215641 AGTAAAGACA TTGAAGCTTA G&AGTAGAAA AAACCATTGT GAGCTAGGTT TCAGCTCAGA 
215701 AAAGCCTTAG TAGTCAGAAA AGCCTTAGTA GTCAGAAAAG CCTTGTCGGA AAAAGTTTAA 
215761 ACCTTTAAGA ATTGCACACA TGGAAAAAGA TCAAGTAAGC TATATATACA CCATCTTAGC 
215821 AATGATTTTG AAGTGAGAAT TAAGGCTACC ACAGCTCCAG GTGGTAAGGA GAGAAATCAG 
215881 GCTGGAAGAG TTTGAAGTTT CTGTATTATT CTAAGCTCTT TACTATTCTA TTATGAGCTC 
215941 ATTAATTCTC ACAACAACCC TCTCATATAA GTACCATTTT AAATTCTTAT TTTACAGAGA 
216001 AGGGAGTTAA GGAAGGTGGA GATTAAGAAA ATTGCCCAAA TACAAATAGC CAGCAGGTGG 
216061 TAGGTCTGAG ATTTAAGCCC ATGCAGATTT TAGCCCCAGA GCAGACATTC TCAATCAGTA 
216121 TGCTAGACTG CCTTTCCATG GTATGTGATC CTACTCAGGC CTCTACAGCT TTATCATTGC 
216181 TGTTCTCCCC AGCCTGTCGT GCTGAGAGTA TATACTCGAA GAGCAGAACT AAAATTCCAT 
216241 CCAGCTTCTC ACTCCTAGGT CCACTACACA GCTGCATCCT GCAGACTTTT ACCTCAAGCA 
216301 ACCCTCCTGC GTTCTTGCTT CCTTCCATCA TAGTTGTAAC CATCTCCTCT ATTTGCAAAT 
216361 ACTATCTGCT GATCTCTCTC TTCTAGACTG- GTTTCTTTCA ACCTTCTTCC CACCAAAACC 
216421 AAGTTAGCTT GCTAAAATAA AGATGGCGCA TTTTTACTCA CCCGCTTGAG AATTTTCAAT 
216481 GTGTTCCTTC ATGCTTACAG AGTAAAGCCT GACCTCTTTA TTGCATGAAT ACAAAAGTTC 
216541 TTAGCCATCT GGCCCCAACC TTGTTCCACT CAACTCCCCT GTGCAAGCAT GGCTCCAGTG 
216601 GCACTGGACA TTGGCTGCTC TCCACATAGA TCTGCACTGC ACTTCCCTCT GGCTCTGCTC 
216661 CCGTTAGTTT ATATGCCTGG AAAGTTCTTT GCCCCTGTTC CTTGTGCCAA AATTCCATCT 
216721 ATCCTATTGC ATAGCTTATG TAAAAACTTC CTAAACCTTT TTTTTTTTTT TTTTTTTTTT 
216781 TTTTTTTTTT TTTTTTGAGA CGGTGTCTCA CTCTTCCGCC CAGGCCGGAC TGCAGTAGCG 
216841 CTATCTCGGC TCACTGCAAG CTCCGCCTCC CGGGTTCACG CCATTTTCCT GCCTCAGCCT 
216901 CCCGAGTAGC TGGGACTACA GGCGCCTGCC ACCATGACCG GCTAATTTTT TGTATTTTTA 
216961 GTAGAGACGG GGTTTCAAGC CAGGATGGTC TCAATCTCCT GACCTCGTGA TCCGCCCGCC 
217021 TCGGCCTCCC AAAGTGCTGG GATTACAGGC GTGAGCCACC GTGCCCGGCC AAAACTTCCT 
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AAATCTTATA 
TTATATTTAT 
TAAAATCCAT 
TTGTTGGAGT 
GCATATGTGT 
CCTGTAAAAT 
TGTAGACTTC 
TCTCTCCAAT 
AGTCTTTCAT 
CAAGAAAATG 
ATTTGGTCAC 
GTTAATTTTA 
AAATACTCTC 
GGAGGCCACG 
AGGGTCAGCA 
GAGCTGGCCT 
CCATTTCCTG 
AATGTCCTTT 
TAGTCACACA 
GAGGAGTTGA 
CAAGCACCTT 
ATGTCAAAGA 
TCTGTCCAAG 
ATGTTCCCTT 
TTTATGATAT 
CCACTAGACT 
TCATCATGGT 
GGGGA7TTAA 
AATTCCAATA 
TTAAATATAG 
ACATGTTAAC 
CTTGAATACA 
TTTCATGGAA 
AACAAGACAA 
ATTGGAGACT 
GGCTCTGACA 
CAGGTTTGTA 
GCATTAATTG 
CCCAGTAAAC 
GTGTCTGCTG 
TAAGTGTTAG 
CTTGTACCTG 
ACCATCTTGG 
TGGACGAGCT 
GGGTTTTTTC 
CAGTACTCTG 
ATGATAATGG 
TAACCATTAA 
ACAGATGTGG 
AGTGATAGAG 
TGATTCCAAA 
TAATTCCAGC 
GCCTGACCAA 
GGTGGCAGGC 



ATTATTATCA 
ATTTTACATC 
TGAGCGGGTT 
GCATTGGACA 
GTGGTTGTAT 
GCATTTCTTA 
CAAAGCCTAC 
TGGACCAGAA 
TTCCTGCCCC 
CTAATGGGCT 
ATTGGTGTTG 
ATTATATCAT 
ATTGCCCAAT. 
AAGTCTCAGC 
TTTGGATCCT 
TTTATCTTCT 
AGCATCCATT 
ATCAAATGGA 
ACCTGATTAA 
CTATTCACAT 
CTGCAGAATC 
TAGTGAAGTA 
ATGCCTTTCA 
CCCCATGGGC 
TTCCTCTCTA 
GTGAAATGCT 
GCCTGATTTT 
AGAAAACTAG 
ATAAGACAAT 
TCCTGGCCTG 
CAGGTATTGT 
AATAATACTG 
GGTTGTTTCG 
CTTATGTGTG 
TTAAAGTAAT 
TTGACAAATG 
GAAGGATTGA 
ATTAGTGTGT 
AAATCTACCT 
TCTCCTATGG 
GGAAAAGGAG 
TGGCCCATGC 
CTGCTCTAAT 
GTCCCCATAA 
AGCCATTTCT 
CTCCATGTTA 
TAATAAGGAG 
TTTAACCTTC 
AAACAGGACA 
CTGCTGCAGC 
GCTTCTTTTA 
ACTTTGGGAG 
TATGGTTTAC 
ACCTGTAATC 



ATTTATCCTC 
TTTTTTTTCA 
AAAATCATTA 
TGGTAAAGTT 
GTACAAGTGT 
CTATAGGTCT 
ATGGCATTTC 
GCTCTTTGAG 
TAGCCTCATA 
GTGATAGCAG 
AGGAGCCATT 
ATTACTTTAC 
AATTCTAAGT 
CTTTGATATT 
TCATCATCCT 
ACATCTTTGG 
TTGGCACCTA 
AGATGATAAA 
CACCTTCCTG 
GGCACCCACC 
TCTACCACCA 
CATTTTCAAT 
CCTGTTCTCT 
CCTTCCAGGG 
GGTTATGTTG 
TGAGGCAAGG 
TAGCTTTAAA 
TCCTCAGAAT 
TTTCTACACT 
AATGGCTTTC 
ACAAAAATAT 
TCTCTTGTAA 
TGTATGTATG 
CATTAAGAAG 
TAATCAGCTA 
GTGGCTTTCT 
AAGAAAGAAT 
AGAAGGGAGA 
AAAAACTAAT 
TTCACAGTGA 
CACATCCTGT 
AGAGGTCTCT 
ACTCATGCTG 
AGGCGATGGT 
GGTTATGCAC 
ACATCAGAGA 
AAACAGTTCT 
ACAATGACCT 
CTTAGAGGTG 
ATCCATATTC 
GAAATAATAT 
GCCGAGGCAG 
TAAATATCAT 
CCAGCTATTC 



AGATATACTT 
AATTGCAGTT 
TTTTAAAAAA 
AAATATCGAT 
TTATGCATAT 
CTGTGAAATA 
ACTAGTGACA 
GGCAGGGGCT 
TTAGATCATG 
AGAGTTACTG 
GAAGAATCAG 
TGGGGAAAAT 
CTGCCACCTC 
TTCATAAGTG 
CTGTGTGGGG 
TGAGTCACTT 
CACCACCCAC 
AAATGTCAAC 
GTGGTTCTGG 
GACTTGTGAT 
CATCTGAAGT 
GTGTCTTCAT 
ACCAAGTTAA 
CTTACCCTGT 
GTGTGTAATT 
AATCCATTCT 
ATAAAAGAAT 
CTTTTAACAT 
TGATTTTGTT 
TCATTAATGA 
TTCTTTTGGG 
GTGCATTGGA 
ACTGCAAACC 
TTGCTGCCTA 
TGCAATGCCA 
ATTTGAGACG 
GGGAACATTT 
GGCATGCCAC 
TTTATCCCTT 
TTTATGATGA 
CCTCACTGGC 
AGGGCAGGGT 
ATTAGATCTT 
CACATGCCTA 
CATCATCCTA 
TGTGAGTTTA 
GTGTTACCTA 
TGAGAGAGGC 
AGATAACTTG 
TTAACCACTA 
TGCTGGGCCA 
GCAGATCATG 
CTACTAAAAA 
AGGAGGCTGA 



CCACGTACAT 
TGGGACCCAT 
TGAGTAGAAT 
TCATGAAACC 
TGGTGTGTGT 
TGTGTCTTGT 
ATCAATTTTA 
GTATCTTACC 
CAAGAATGCA 
TGACAAACTA 
AGAGTGTGTT 
CTGTGAGCTA 
ACTGTTGGGA 
TTTTTCTCCC 
GGACTAATCT 
TCTCTTAAAT 
ATTCTTCCTA 
GGTTGGTATC 
GAAGCCACAC 
GCAGTCTTGT 
GCCTGCTATA 
ATTTCATTAT 
TCTTGCAAAG 
CAGATTCTGG 
ATTTATTTCT 
ATGTTTTCAT 
CAGTGAATCC 
AGAATGTTCT 
TTTATAGCCA 
TGCTAATTAT 
AATCCATAAT 
AATTTTTCCC 
TGACTATTCA 
AAATACATAA 
CGCTCCTGTT 
TAATATCTAA 
AGGTCCTTAT 
TTCAGAGGAA 
CTTCCCAGGT 
CCCCATGCAT 
TCAACAGGTA 
GTGGATCTCC 
TCTTTTCAGC 
CCACTTTGGG 
ACATACCTAC 
CTTCCTATAC 
TTACATTCTG 
ATTGTTATAA 
CCCCAGGTTG 
TGCTATACTA 
GGCATGGTGG 
AGGTCAGGAA 
TACAAAAATT 
GACAGGAGAA 



TGTAGTTTTA 
TAGTGAGTCA 
AGAATAGAAA 
ATCGTTTGAG 
GTTATGTTAC 
TGTTTTTTAA 
TTCACATTTT 
GATTTTTGTA 
ACTGTAATCA 
AGGGATTTAG 
ACTATTATTT 
TTTTAGAAAT 
CATTGTTTAG 
TTTTTCCTTT 
CACAGGCCTT 
CCTAATGCCT 
TATGAAAGAA 
ATTTTTAATC 
GCAAAAGGTA 
CCTTCCATAT 
TGCAGTTAAG 
AATTATTATT 
TTCAATTCAA 
CATTCTCTCC 
CCTTTTCTTT 
CACTTGGGTG 
AGTAATTAGA 
TCAAATAAGG 
AATGGTGTCA 
TTTGGTTTGT 
GGATGTATGG 
TGCCACATGA 
GATCTTCCGC 
CACTGTAATC 
ATCTCCAGAG 
AAAGCTTTAA 
GGTAGAATAA 
ACTTCCTTCC 
AGCACTGGCT 
CACCCGTGCA 
CAGTGCAGAC 
TCTGAGAGGC 
CCAGTTCTCC 
CCATTTTCCT 
CAACGTATAT 
TTCTACGAAA 
GCTTTACATA 
TTCCCTTTTC 
CACAATACTA 
CCACACCAGC 
CTCATGCCTG 
TGCAAGACCA 
AGCCAGGTGT 
TCGCTTGAAC 
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CCAGGAGGTG 
GAGTAAGACT 
CCCAGAGTGA 
GGATGCTAAT 
GTGTAACTGA 
CCTTCAGAGT 
AGGAGGTCAG 
AAAGCTCTTT 
CTGCACATGG 
CCATCAATAT 
TTGCTGATAC 
TTAGATATCG 
GAGGTGTTAG 
GGCAGAATGA 
GCATTTTTAG 
AAGCTTCCCT 
TGGAAAAGTT 
TATATATATA 
AGACTTGCCA 
ATCCCCATTT 
CACATAAAGG 
CCCGTTGCAC 
GAGACTGCAT 
AGACATTGCC 
CATTAGAGCT 
TAAAATCATT 
GGGGATTTGG 
AGGTTGGGTC 
ACACACGGTG 
AATCAGACTC 
GATGAATAGA 
ACTTCCTAGA 
TGTCAAGGAA 
CAGACTAATC 
AAGATAATTT 
TGTTTGGAAA 
ACTGACTAGA 
GTTCCCAGGT 
TGACTTAGTA 
TTGAGATGGG 
AAACAGCAAC 
AATGCCATTT 
GAAAAACATT 
ATGTGCATTT 
ACATTGTTCC 
AACTGGTGTA 
TTGAGTCTGG 
TCTTTTACCT 
CCCGCCTCTG 
ACATTTTTTA 
TGTGTTAGTC 
ATTCTGAATG 
GAGGTTTCTA 
TGTGGCCTTT 



GAGGTTGCAT 
CCGTTTCAAA 
TGCAGCTTCT 
TTTCCCCCAA 
CAAATTTTGG 
GGAGTTCTGT 
CTGGCAGATT 
TCATCTCTTG 
TCTCAGAGGG 
GTGCTGTGGC 
TTATTCCTGG 
CCCCCAGGTA 
ACCTCAGTGG 
CAAATAACTA 
AACAACAATT 
AACAGAGATT 
TCCATGGTGT 
TATATATATA 
TATATCAACA 
TATAAGGGAG 
CAGAGCCAGG 
AAACTGGCTT 
TGCTCCCTGG 
CTGAATGTCT 
GAATTGCATT 
TATAAAATCA 
GCTCATCGCA 
AGTTTATTGA 
CTCTAAAGAT 
TGGTAGGTCA 
TGTTAGATTG 
GGTACATGAG 
TAGCAAGAGA 
CAATTTTTAA 
AATGTCTGGA 
TGCAGGCTCA 
ACCAACTTAC 
TAATATTTGA 
GGGCTTTCTG 
TTATAGTGAT 
AACAACAACA 
TAGGCATAAT 
AGTGTATTTT 
TGGCCATTTT 
TTATATTCCT 
GAAGGAACTT 
TTGGAGGAAT 
CACGTTTGGA 
AGGACATAAA 
CTTCTCTCCA 
TTCCCTGGGG 
AATTGGTCTG 
GCATGCGCCC 
GAATTTTCCT 



TGAGCCAAGA 
AACAAAAAAC 
GGCCCTCTTA 
ACAACCCACA 
TGCTAACGTA 
CCTCCCTGCC 
TCCTTTTGTC 
GTAAGGATAA 
TTCCCTGACA 
CCTGCCCTTT 
GACCAGTAAC 
AGAGCTCTAC 
TCGCCGTGAA 
CAAATATCTG 
TCCAATCTTG 
GAACTGTGTA 
TGTTCATATT 
TATATATATA 
CATCTAATCC 
AAGGCTGAGG 
ATTTGGACTG 
CTACACTGAG 
TTATTGACTT 
TTAGGTGAAT 
AAAGTTGAGT 
TCTTCCCATA 
GGAATCATCT 
ACATCTTCAA 
CTGGATGGCA 
GATTTCCCAG 
ATTAAAATGA 
CATGAAACAG 
CGAAGACAGA 
AAAATCACAA 
AACAGATCGG 
TGAGGAAGAT 
AAAGAGAAGT 
CTAAACTGCT 
AGGAGGGTCA 
AGTTGTCAAC 
ACAAAAAAAA 
TTTAAATGAG 
ATTTTTGTTT 
GTTTCCAATA 
TGTGATCAAC 
GTGAGATTGA 
GTCTTTTTCC 
CAAGCAGAAC 
GTTACAAACT 
TATTCCTGAC 
AGCCTTTATA 
GGGTGGAACC 
GGGGTTGACA 
CATTGGAAAG 



TCATGCCACT 
CCAAGAAATT 
TCTGAGACAG 
GTATCATGGG 
TCTCTATAAC 
TTTTATTGCT 
CAGGAATCTT 
GCGTGTGGGC 
GCATGTCCTC 
GTGGCCTCCA 
CTATGTGACT 
CTGTTTTTTC 
ACTCTTTAAT 
TCTGTGGCCA 
GCCAGTAATC 
TGCTGGGAAA 
AGCTACCACA 
TACAGTCACA 
TCACAGTTAT 
CACAAGGAGG 
GGGGAGTCTG 
CAGCCAGGGT 
GGTAGATTGG 
GAAAAACTGC 
TGCTGCAGAA 
GATATGCAAG 
CTTCCACTGC 
GTGGCAGGTA 
ACACAATTAC 
AGGAAGAAAA 
GCTGTTCCGG 
TTCTTAGTTA 
GGGGCAAAAG 
AAGGGAAACA 
CTGTGAGACA 
GAAAAGACAG 
TTTGTTTTTA 
AGGAATCCAC 
CACAGAAGAC 
AGCCAATACA 
AAAACAGAGA 
TAATATTATA 
AAAGAAATAA 
GTTTCATAAA 
ATTGCAATAC 
TCATTTTCTC 
TGTCTGCTGC 
TTCAAGACTG 
TAAATGTGGT 
CATAGACTCA 
AGACACTGAT 
CAGATACTAC 
ACAGCTGGAC 
TACTAAATAA 



GCACTCCAGC 
AATATTGCTT 
TGTTCTTTTA 
GGTAAGTTAA 
TACTCTGTAT 
GCTGCAAGCT 
CTCAGATTGA 
CCATTTAACC 
ATTGCCCAGG 
GTTACGTGAT 
CAGGGTTTAT 
CCCTCCTCCA 
GTTACTGACA 
TTTTTAGAAC 
ATTTTGACAA 
AGGCCCACAC 
TATATATATA 
ATAAGCCAGC 
ATTAGGTAGG 
TTAAATGGTG 
GCTTTGGAGT 
AAAGAAACGT 
TAATTTCAGG 
ATTAAGCAAA 
GCTGTAGGTG 
TTTCCTCATG 
CACTGGATTC 
TTGTTTTAGG 
TCTATTTACA 
ATATAAGCTT 
TGCAGAAGAC 
TGACCAGAAT 
AAGATCATGA 
AAGTGTCCTA 
TTGCAAGGAG 
ACCCAGGCAG 
CTACATTTCT 
TGTGACTATA 
CAAAGAGAAC 
GAAACAAAAA 
AGACACAAAC 
TGTTGAAATC 
CCATCTCAAC 
CTTTCTTAAG 
ACAACTGGGA 
TGTTTTTTAC 
AGTCAACATG 
GGCCAAAGAG 
ACTGAGCATG 
GCAGTTCTTA 
ACTTGGGACC 
TAATTTTTAG 
AAACTTGAAA 
ATAAAAATTC 



CTGGGCGACA 
TTATCTGGAG 
GTGTGAAAAA 
TGGCTGGTCT 
AAACTTCCTT 
GTACAATTTT 
TCACTGTGCG 
AATCCCTTTT 
GCTCCTCCTT 
AACCATTATT 
CATCAACACC 
GACCCCTCCA 
TTGCACTAAT 
AACAAATGTG 
AAACCTTCCC 
ACAGGTGATT 
TATATATATA 
TCCTGTGCCA 
CCCTATTGTT 
TGACTATGGT 
CTGTGTCCTG 
GGTTCCCAGA 
TTTGGCAAAT 
ATGACTTTGC 
GCTTTCTATA 
GGAATCTCAA 
CTCATCAGTC 
TGTTGGAGAT 
TGAGCCTCTA 
ATTTTCTCAA 
AGCACGTATG 
GAAAGACACA 
AGAATATGTT 
GGCCAGTTTA 
GCTTGCTCGG 
GGATGGAAGG 
ATGTGATCAA 
ATGCTGGAAA 
TCATGTTGAA 
AAAACAAAAC 
ACAATGCCAC 
CAAATTTTCA 
TCAGAACCCC 
TAACTACTGC 
GGGCTACTAG 
ATCTAGGATT 
TTTGGCCTGG 
AGGACCCTTA 
AACTTTTTAA 
ACTCTGGCTG 
CACTCCAGAG 
ATACTCCTTA 
AGTCAATTCA 
ATGTGAAAAT 
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GATCACTGAT 
GAATTGTAGC 
AGAATTAAAG 
TAAATTCTAT 
CAGATATTTC 
ATGATGAGAA 
AGAGAAGATA 
TAGACACTTC 
ATGACAGAGA 
ATAACAAGAT 
TACAAGAGAA 
TACTTTCCCA 
AACACTTAAG 
CAACATGTCT 
CAAGAACCTT 
TTTTGAGCAA 
AACATTGATT 
CTAAAAGAAG 
CAGAAAAAAG 
GGGCCAATGA 
TAAAAAGATG 
ACCACTATAA 
ACATGTGGAG 
CAATATCTCC 
ACATATACAT 
GAACAGGAAG 
CAATGAGAAT 
GATTAAAGGA 
GCTACAACCA 
GTTGTGGCTG 
CTTGACTTGG 
TGAATGTAGA 
TCTGACTTCG 
CTT TTmTT 
AGTGGCTCGA 
TCAGCCTCCC 
ATTTTTTAGT 
CGTGATCCGC 
CGGCCCCTGG 
GAACTTCCAG 
GATCAAACTG 
ACCCTGAAAG 
TGACTCAGAA 
GATTGATTCC 
ATGGTTTGAG 
AACTGATAGG 
CAATAGTCAT 
GAGATGACTT 
TAATGTTGAG 
GTTTATTTAG 
TGTTTTGTGT 
AGGAGAACTT 
AATTGCTGTT 
AGCCATTTTG 



AAATATCTTC 
CATATGTTAC 
TTTTTATTAT 
TAAAATTTAC 
CATCAGATTA 
AATGACCAAT 
GAACTGGAAA 
CAGAAGGGAT 
CACTTTCAAG 
GAGAAAAGGC 
GTTCCTTTTG 
AGATACTCAG 
ACATATCCTT 
AGAGAAGAAG 
GGGCTAATTC 
GTTTTTCAGA 
TTGCCTTCAT 
AAGTAGAAGA 
AGGAAAAAAA 
CTTGAACAGG 
TTCAACTTCA 
AATTAACTAA 
CAACTGGAAC 
TAAAGCTAAA 
CCATAAAACA 
TTGTCTGTTA 
TAACAGACCC 
AGACAAAACG 
ATCCGTCCTG 
GATGGATGGT 
ATGTGTGTTT 
AAATAAAACA 
TTTTGACCAA 
TTTTTTTTTT 
TCTTAGCTCA 
CAGTAGCTGG 
AGAGACGGGG 
CCACCTGAGC 
TCCTCTGCTT 
TATCAGAGCA 
CAAGTTCTCA 
CATCAGTTGC 
TGCCTAGGTT 
TGACAGATGA 
GAAGAGTTAC 
AAACATTTCT 
GAAAATTAAT 
ACTTTTTCTC 
CTTTCCCTTG 
GACTTTGGCT 
ATCTTTTTTG 
TCCTTTTTCC 
GTTATTTGAA 
AGGAGACTTT 



ATGGTGGGGC 
AGATCTCAGC 
TTTTTATACA 
ATGCTAAAAT 
AACAGATATT 
ACAAGATTAA 
GCTTGTATTG 
AGCAATATAG 
TGAAATGACA 
ATAGAAATGT 
AGCGTAGAAA 
ATAGGCAGCG 
TAGTTTGTCT 
TTCCCACCAT 
AGCAGATGAA 
AAAACAGAGT 
GATATTGACA 
AAAAAGAAAG 
ACCAAAAAAG 
GACTTCATAA 
TTAGTCATTA 
TGGATAAAAT 
TTTCATACGT 
TGTACAATTC 
TGTACAACAA 
AAAAAAGAAT 
CAATATATAA 
CACATTCTTT 
TTAAAAATCA 
ACTTAAGAAG 
ACTTTGTGAA 
GAAAGCAAAT 
TGGAGCAGTT 
TTTTAGACAG 
CTGAAAGCTT 
GACTACAGGC 
TTTCACCATG 
CTCCCAAAGT 
TCATGTTCTT 
GGAAGGAAGG 
AACAGCAAAA 
TTCCAATTGC 
TTCCCAGCAG 
CTTCGGTGTG 
CATTCACATT 
AATTCATCTC 
TCACTTTTCT 
CTTGACTGTT 
AATATTCTTT 
GATGTACTGA 
TGTCTGGATA 
CCATTACTCT 
AGCTTGAAAG 
GATAACTTTC 



AGGTTATTGG 
ACCGATCAGA 
TTGTAAAACA 
AAAATAGACC 
TATTTATCCT 
ATAAATGAGG 
TGAGAAGAAT 
TTTAGACCAT 
ATTTATATGG 
ATCACATACA 
AAGATAATTT 
TCAACTCTAA 
CCTCACACAG 
ATTTTAAATC 
GAGAATCTCC 
GTCAGGCCCT 
ACACAAAGAG 
ACATAGTATA 
GGTGGGGGAC 
AAGAGAAAAT 
CAGAAATGAA 
GAAAGGAGAT 
TACGAATGTG 
CAGTGACTCA 
TGTTCATAGG 
GAGTAAATAA 
TAGATGAATG 
TAAAGGTTTA 
GTGAGCGATT 
TGCTCCTGGG 
TATTGTACAT 
TCAAAGTATC 
GGGAAGGGGT 
AGTCTCACTC 
TGCCTCCCGG 
ACCTGCCACC 
TTAGCCAGGA 
GCTGGGATTA 
CTTGGTCCTG 
CAATGGGTCA 
TTAATGAGCT 
ATCAGTTGCC 
CTTCTCTGAG 
TCAGACTTTC 
CCTAATGGCT 
CCCTCCCCAT 
CAAATAGTTT 
AAATATTATG 
TGATGTACGA 
TATATGAGAT 
TGGAGCTTAT 
GAAAAAGATT 
CATTGGTTTG 
TCAATTTCCT 



ATGCAGAGAA 
ACTGTAAAGC 
TAGACGTTTA 
ATTTTCAAAT 
AGCCCAATTG 
TTAACTTAGA 
GAATGTGAAG 
ATAATGAAAA 
GGGAGAAAAA 
AGGCATAGAA 
AACCTTCTTC 
CAGGAATTAA 
AACTGATTCT 
CTATTAAAAA 
TAATGCAAAT 
GAGGGTGGTA 
GAAAGGGGGT 
ATAGGTAGTC 
AGACAACCCA 
GTAAGTGGCT 
AATCAAAACT 
GGAAAACAAA 
AACTTTGGAA 
GACATTTTAC 
AGCACTATCT 
ACCACGGTCT 
GGTCTCATAA 
TAAAATACTT 
TCCCTTGTGC 
GTACTAGAAA 
TTATGATTTG 
ATCCTTTTGA 
CTTGGTCCTT 
TGTCGCCCGG 
GTTCATGCCA 
ATGCCCGGCT 
TGGTCTCGAT 
CAGGTGTGAG 
TTCCTCCTCC 
ATCGATGCTG 
CAGGCTTTGA 
ACGGGTGATA 
GTTTTCCCAG 
AGGGTATCTT 
TCAGAATAGA 
CCCTAAAGGA 
ATTGTCATCT 
AATTATATTA 
CAGAATTTGA 
TGGCTCTGTA 
GCTGATTTCA 
GACTAGAATG 
TAAAAATCAT 
TCAGTTACTG 



GATCTGCTCG 
TATAATCCCC 
TTTATGTGAT 
TATTTAGATC 
CAAGAGATTA 
AATCAAGGAC 
GAAGGCAATG 
TTGGAGAGAG 
TATTGAAGAC 
GTGTATCACA 
ATATTTTTCT 
TTTGGCTCCT 
GGTTTTGCCA 
ACTGCTTGGA 
CAATGGGTAT 
CTAAGATGAG 
TTGCAGAAAA 
AAATTATGTA 
ACTAAAAAAT 
CCTTAACATA 
ACAATGAAAT 
ATGTTGCCAG 
AGCTGCTCGG 
TTAGAAATGC 
GTAATAGCCT 
ATTTGTATAG 
GCACAATATT 
TTTAAAAACA 
AGGGATGGGG 
TATTTTATTT 
TGCACGTTTA 
GAGCTTCTGC 
CGGTCCTTTG 
GCTGGAGTGC 
TTCTCCTGCC 
AATTTTTTGT 
CTCCTGACCT 
CCACCGCGCC 
TCTTTTGTTG 
TCAGCTTTTG 
AGAAACCATG 
AGAACAATGA 
CAGCTTCTCT 
TCCTTATGTG 
TGCAATTGTG 
TTGTTTCTAA 
ACCTAATGAT 
ATGTATTTCT 
TTCACTAATA 
TGCATACATG 
AAAACAAGAA 
GAATTTTTAT 
GCAGGCTGAA 
GTCTTTTAAG 
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GGGTTTTATA 
TTTTACACAC 
TAATAATATA 
TTGCATGTGT 
TAGCTTGTCT 
AGGCTGGAGT 
GATTCTCCTG 
CTAATTTCTG 
CTCCTGACCT 
AGCCACTGCG 
TCAAGCTTAT 
ATTTATTTTT 
TCCACTTTTG 
AATGCTTTAT 
TTTTTTTTTT 
TGGTCTGAAA 
GACAGACATG 
AC ACACTG AG 
CTCTGACCTT 
TAGCATTTGC 
CTGAAACCAA 
CTTAGGCTCA 
GCTGGACACT 
TTTTCATGTC 
GAGCAGTACT 
CTTTATGATC 
TTTGTTCCAC 
TCCAATATGA 
GGTGCTATAG 
CATTTTCTCT 
TGATTGTCCT 
AACCTTGTTG 
TTTCTCCATG 
TTCCATCAAT 
CCCTCCTTCC 
TGTAAACACT 
TGCTAATGAT 
GCTGTGTGTA 
AGAGGACTGG 
GAACTCATTC 
TTTAAAAAAC 
TATGTCATGA 
TGTTTAATGT 
CAAGTTAAAA 
ATCAAAAGAG 
TCTAATATTT 
CCCAGGAGAC 
AGAGTGAGAC 
AAACAAACAA 
CTAGTTTCCC 
GACATATGAG 
GCTGGAGTGC 
TTCTCCTGCC 
AATTTTTGTA 



TTTTTCTTTG 
TATTTAAAGT 
TTCATTTTAT 
GCTTTCTTTC 
TTTATTTATT 
GCAGTGGCGC 
CCTCAGACTC 
TATTTTTAAT 
TAGATGATCT 
CCCAGCCCTG 
GTCCTATTTC 
CATTTAATTA 
TGGGCAGATT 
TTCTCAAGTT 
TTTTTTTTTT 
CTCCTGGCTT 
AGACACCATG 
GCATCCTATC 
TTGCAGTTAA 
ATTCTGTTGG 
GATGAGGCAA 
TGCAAGAACA 
TCTTGCTCAC 
CATCCTTTAT 
TGGATGAGCC 
ACTGTGGAGC 
AATTTGTCTT 
GGAAGTCTAG 
GATTCTCTTT 
TGCTTTTTGG 
CAATTTGTTT 
CCCATCTTTC 
GACTTTTTGG 
TTCAACTTAT 
ACTTTAGAAA 
TTCTGGTTGT 
TAACACATTC 
TTTTTTTTAA 
CCAGAGTGGG 
TTTCAAATGA 
TTGATATGAA 
AATACTTATT 
TTTCTTTTAT 
ATATTCAAAG 
TCTGAAGACC 
ACTATTTATA 
GGAGGTTGCA 
TCTGTCTCAA 
AAAAATCCGC 
TTTCCTCTCA 
GTTTTTGTTT 
AATGGCGCAA 
TCAGCCTTCC 
TTTCTGGTAG 



ATCAATTTTG 
ATATTTGCAA 
CTATATCTGA 
TCCTTCATTA 
TACTTATTTA 
GATCTCGGCT 
CCGAGTAGCT 
AGAGATGGGG 
ACCCACCTTG 
CTTGTCTTTT 
CCTTTGCTTT 
TGAAACAGGT 
ACATTTTGCT 
AATAACCTAT 
TTTTTTTGTA 
CAAGGGATCC 
CCCAGCCATG 
ATCTCACTCT 
TGTATTAATT 
GTATTATACT 
GTGAGGTGCC 
GAATTGGCAC 
TTAGCATACC 
CCTTCTTCAT 
TCTGAGTCCC 
CTTAAAACAT 
ATTCAGAACA 
TTAGCCAGCT 
ATCCTGGAAT 
CTGGTGGTCT 
TCTTTACTAA 
TGGTTTCTGC 
TAGTGGAGGC 
TTCCTAAAAT 
GGAAAGGCAT 
CAACAAAGGA 
ACCTTGGCTC 
TCACTGAGAA 
AATGTTCTGA 
AGCTGGCATA 
TGATACAATA 
CTAATTATAG 
TTACAAAACA 
GAATGCCTAA 
ATTTAGCTAT 
ATCCTTAAAA 
GTGAGCCAAC 
AAAAAAAAAA 
CTTAACATTA 
GCCCATTGTC 
Tl ' T ' lTrmT 
TCTTGGCTCA 
AAGTAGCTGG 
AGACGGGGTT 



ACCATTTATG 
AAATTCAACT 
GGTTTTAGCT 
GACTACTTAG 
TTTTTGAGAC 
CACTGCAACC 
GGGATTACAG 
TTTTGCCATG 
GCCTCCCAAA 
TATTTTATAT 
ACTTCATATA 
TAAAGCTTAG 
GTGTTGTGCT 
ATAGTAAAAA 
GATACAGGGA 
TCCTGCCTTG 
TCTCTCTCCT 
TGGTTTCACT 
TTGCATTGAG 
TTTCACTGTT 
CAGGAAGCAA 
ATGAGAGTGA 
CCTGGACAAT 
CTCAAAACAT 
ACAGTAGCTG 
TGTAATATTA 
GTATTGACTT 
ACTTTTTGTA 
TCCTTCACCA 
TAGAGTTTCC 
GAATCTCTCT 
TGACTTTCAT 
AGGCAAACAC 
TGCCTCAGAA 
CCACACTTTA 
GTACTTCCAA 
TTGGTTTGCC 
TATGCACAGT 
ATTCAGAATA 
TTTTCCCAGA 
AAGTGGTTAG 
TCACTCTTCA 
ATTTATTTTT 
AGTTTTCAAA 
CCAAATTGTT 
ATTTGCCTTA 
ACAGTGCCAC 
AAAAAAAAAA 
TTTGTTCATT 
ATATTTTGAT 
TTGGAGATGC 
CTGCAACCTC 
GATTACAGGC 
TCACCATGTT 



TTATCTTGGA 
GTTTTATCAG 
TCTTTGTACT 
TCATTTACTA 
GGAGTCTCAC 
TCCGCCTCCC 
TCATGCACCA 
TTGGCCAAGC 
GTGCTGGGAT 
TTGATTAGCT 
AATTTTGTTT 
AGGAAAATTG 
CCCAAATTCA 
AGTGGCTGTT 
TCTTGCTGTG 
GTCTCACAAA 
TATATATAAT 
ACTGTTCTCT 
TAGTTTCCAT 
ATTTGAACAT 
TATTTAAGGA 
GTGCCTCCTT 
GAAGTGTTTT 
TTCAATGGAG 
AGAATTTATT 
ACTTAGCTGG 
CCTGCTAGTC 
GGAGAGCTAT 
AGATGTGCCA 
TTCGATTTTG 
TCTATTTATC 
TTTTGGACCT 
TTTCCAAAGT 
TGTGCCTATG 
TTTAGGTGCA 
ATATTGGTTT 
TGCTCCCTCT 
ATTGTATGTT 
ACTGAAGCAG 
GCACCAAATT 
AACTTTTATT 
TCTTATTTCA 
TGATGAAAAG 
ATTCTTTTAC 
TATTTTTAAG 
GCACAGGAGA 
TGCCCTCCAG 
AAAAAAGGCC 
AAAAACTTTC 
TTTTATCACT 
AGTCTCCCTC 
TGCCTCCTGG 
ACCCACTACC 
GGCCAGGCTG 



GGATCATCTA 
GCTATCTTTT 
TCTGACCCAA 
ATTTTAAGAA 
TCTGTCACCC 
GGGTTCAAGT 
CCATGTCTGG 
TGGTCTCAAA 
TACAGGCATG 
TTATCTTTTA 
TGGATAGTTT 
CTCCTCTAAG 
TTGTTCTTTT 
GACTCTCAGC 
TTGCTCAGGC 
ATGCTGGGAT 
AAGAAAACAG 
GGAAGTTTTG 
AGAAGAATTA 
AATTTGAGGG 
GGCATCCTTT - 
AATTTTGAGT 
TTGTTTTGTT 
TATTTTTTTG 
TCATAGTACT 
GAACAGAAAT 
TCTTCTGATG 
GTTTAGGCTA 
AGGTGTTAAT 
TTTTATTTAG 
TGTATGGTAA 
TTTACTTTGC 
CTTTCTCAAT 
TCCACAATAT 
ATGCCTGAAG 
GGGGATAACC 
TCTTTTATCT 
TTATTATAAG 
TACAGGATAG 
TCAATATATA 
AAAATAAACT 
TCTTATAACA 
TTTTAGAAAT 
ATGTTGTACA 
CAGTATCCCT 
ATTGCTTGAA 
CCTCGGCGAC 
AAAAACAAAT 
TTTAATACTA 
TGCTTTGTAG 
TGTTGCCCGT 
GTTCAAGCAA 
ACGCCTGGCT 
GTCTCGAACT 
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CCTGACCTCA 
CCACCTGCCC 
TAGGAATTCA 
TTATATTGAT 
GATTTTTGCA 
TTCAAATAAA 
GCCTCAAAAG 
TGTATTTTCC 
AGTAAAGTGC 
CAGACCGTTT 
ACACATAATT 
AAGAAATCTT 
TAGGACCAGT 
GGGACTCCTC 
AGGTGATTTC 
ACGGAGCCCA 
AGCTTTCAAC 
GAAGATGATT 
TTAAGAGTGA 
TGGGGGAACT 
GAGCCCCGCA 
GGAGCATGAA 
GAACCGATTC 
CATACAGAGG 
AACTGTGTCA 
AAATTGAGTA 
GTTAACTATT 
TCTTTAGTCT 
TATATCCTCG 
CCACTTTTTT 
TGTCCCTGTC 
GTATCACACA 
AGACAAATGC 
GTTATATGGA 
AAGATGACAT 
AGCCCTGACT 
CGGAGTGCTG 
AATAAATAGG 
ACAAGGTAGA 
ATTAAAAGTT 
TGTGCATGAG 
GCTCCATTTT 
TTAAACACAG 
AAGAGATGAA 
ATGATGCCTG 
TTTACTGCTA 
TAATATATGA 
GCCCCATTGC 
TTTGATTTCA 
GTCTTGTTGG 
TCGTTCTGTC 
TCCCGGGTTC 
ACCACTATGC 
GAAGCACCTA 



AGTGATCCAC 
AGCCAGAATA 
GTTACTTTCT 
TTCTCTTTTT 
CTGTAGTTAA 
TTGAGGTGGG 
GTCTTAGCTG 
CTCTACTCAA 
TCACTCTTTT 
TAGCTTCCAA 
GAGAAAAGAT 
GGAAATAGGT 
TCTACTTAAG 
TTTGTAGCTC 
AGTTAATATG 
TCAGCATTCC 
TGTTTTGAAA 
CTGCCTCTTT 
ATTACCCTCA 
ATCAGAGAAA 
TGATGAAAAT 
AATCCAGGCC 
TGATGAATGA 
TTGGATGTAA 
CATAGGTTCC 
AGTCTTTTCC 
TGTATTTGGT 
TAAGGTTGAT 
CCTTCAGATG 
TGTGGCTCTG 
ACAAAAGTGG 
CCAGCCGTAT 
AACCCCTGCC 
GGTGGCAAAT 
TTGGGTAAAA 
AATACACAAT 
AGAGCCTTCT 
ACAAAATTTA 
AGGTTATTAT 
TTTAAATCAC 
TGGTGTGCAT 
TCTCCTAAAA 
CAGTAGCATT 
CAAGCCCTGT 
GAAGGGAGGC 
AAAACCCTCT 
TTTGGCACTG 
CTCACAGAAA 
GCATTGCTAT 
TTTTCTGCTA 
ACCCAGGCTG 
AAGCTATTCT 
CCCACTAATT 
GAAACTCTAA 



AATCCTTGGC 
TATGTTCATT 
TGAGAAAATC 
CATATTGAGA 
AGAAACCACC 
GTTACTCTGA 
TAGCAACTTG 
CATTTAAGGT 
GCTTTAACAA 
AGGGAGTTCA 
AGTTCCACCA 
TTATATAAAA 
CCACCCATTT 
CAAGTGCCAC 
ATCAATTATT 
CTGCAGGGAA 
TCACTTTCAG 
TAATATGTGA 
GTGGTCCAGC 
TTGGTGCCAT 
CAGTGGACAG 
AATCTGGCAC 
CTGTTTAGCC 
ACGGGCCTTT 
AAATGGTGGC 
TCTTTTGCAG 
AATTTTTAAT 
GCTCTCCATG 
GGATTATTCC 
GGTGAGATGC 
ATAGCCTAAG 
GCCAGGCACC 
CATGTGAAAG 
GCTAAAAAGA 
GCCCATGTAT 
GACTTTGAGA 
TAGTGTGTAT 
TCCAAACTTA 
TTGACATTTA 
AACTGCGTGC 
GGGAGACAGC 
TCAGTAAGAC 
TGGAAGGGGT 
ATCTGAAGCC 
CCCCTGCACC 
TCTTTGGATC 
AGTCTGTCAC 
GAATTTCATA 
TTTTTCTCTT 
ACTCCTGCTT 
GAGTGCAGTG 
CCTGCCTCAG 
TTTGTATTTT 
TTCTTTGTAG 



CTCCCAAAGT 
TTGAGTCCTT 
TCTGAAAAGA 
ATTGTTTTTT 
TGTGTGTTGG 
GAATCAAAGG 
CTCCATTGTT 
CTCAGAAGAT 
ACCCTAGAGA 
GGACACCATG 
AATAAAGTTG 
TTTATTTTTT 
GCCAAAATAA 
TAACAATTCT 
TCATTTAAAT 
CTGCAGTGGC 
GGTGGTCATG 
CTCCTCAGAT 
GCTTATGAAC 
GGACATAAGA 
CATCATTATT 
CATGAGCTCT 
ATTTTAGAGT 
GCCCTCTCTT 
CTGAATACTA 
ATACCATCAT 
AGAAATGTAA 
TCCTTCCAAA 
ATTTTGTTCT 
TATAGGTACA 
TGGTGACTTT 
ACTCTAGGTG 
AGAATAAGAC 
AAAATTAAGC 
ATATGTTCTA 
AGTTACTGGC 
TCAGTGTTTT 
AGCCTTGCTT 
AATCCAACTG 
AAAATAAATG 
ACGAAGCTAA 
AGAAGCTGGT 
TGCTCTCATT 
ATCATGCCTA 
CTAGAAAGCT 
TGGACTTTAC 
TGCTGCTAAC 
GCTTCCAGCA 
GGGTGTTGCA 
TTTTTCTTTT 
GCACAATCTC 
CCTCCCAAGT 
TAGTATTGCT 
GTATCAAACC 



GCTATGATTA 
TAACAAAGTC 
TGCCAATAAT 
AAAAAGTTTG 
TTAAGCCATA 
AAAACCTGAA 
GAAATAAATA 
AATATAATTG 
GCTGGTAGGC 
ATTCACGACC 
AAATGCTGAC 
CCTTTTTTAT 
AGTGAGAATC 
TAGGACCTGA 
GGCTCTAATG 
TTTTATCAAC 
TAGTTGCTTT 
TCAGAAAGTG 
CCACATCTAA 
GGAAGGCACA 
TACAACTTTG 
AATTTTTGTT 
GTGGCATACG 
ATGAACATAG 
TTTACAACTA 
TATTCATATA 
TAATTGCTTC 
AAAAGGTATG 
TTGTTAATAT 
ATGACAAGTG 
TACCTCCACT 
CTAGGGATAC 
AATAAATAAG 
AGGCAAGAGG 
TTGGTTTTAT 
TTTTGATTTA 
AAGAGAGCTT 
TAGGTAAAAG 
AAGACTAATA 
GAACTGCCAT 
TCCCACTCAT 
CAGATTATCA 
AGGCAGTGCC 
GTTATGGTCC 
GGGTGGGTTC 
CTCTATCTGA 
TCAGCAGTTC 
TCCTCTCTCC 
GCTCTCTCTC 
TTTTTTTTTG 
GGCTCACTGC 
AGCTGGGACT 
GTCATCAATC 
CTAGGACTCT 



CAAGCATGAG 
ATAAGAATTT 
TTGTAGCCAA 
TATGTGTGAA 
AGTACATGTA 
GAAACAGGCA 
GGCTTGAACT 
GTGAAATTTA 
AGAGCCTCAA 
ACAATACATC 
AAGAAGGGGT 
TGTTATGGAA 
GTTTCTTTTG 
GCTATAAGCC 
TGCAGAGGGA 
TTGAACAGCT 
TTTGAAATCA 
CTCGCTAGTC 
CCCTATCCCC 
GTGAAGCAGA 
TAATCACCCA 
GGAGTTCTTG 
TGGCTGCTGG 
ACAGGAACTA 
AGGTACAATG 
TTTCTTCAAA 
TCAAGTTTAG 
TTGCTTTTAT 
ATACTTTGAG 
ATACGTGTGT 
CCAAATATAT 
AGCAGTAAAC 
TAAAGTGCAT 
ACTCATTGAA 
TTCTCTGGAG 
TCACACTATT 
GTGGATGAAT 
GGCTCCTCTT 
AGACTAATTA 
GCTCGCCAAG 
CTTGCAGGTT 
AGAGCCCTAG 
TGACCACAAC 
CCGACTGTTC 
TACTGTCTGC 
TTTTTTTTTC 
TAGGGTCATT 
TTCATTATAC 
TCCTTCCCAT 
AGACGGAGTC 
AACCTCCGCC 
ACAGGCGCTC 
CACATGTCCA 
TTCCTCTAAT 
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CACAATATAT 
TACTTTCTGA 
TGTTCCCAGG 
TTATTTGTTC 
GAGTCTTGCT 
CCACCTCCTG 
CGTGTGTCAC 
TGGCAAGGCT 
TGCTGGGATT 
CTTAACAAAT 
TGAATATTTC 
TAAAATACCT 
GGGGTAAAAA 
AATACTAGGT 
AGTAAATGTA 
AAAAAGTATT 
TCCAAGAGAG 
TTTGTCTATC 
AATCTGTTAC 
TTACACCTGT 
TTTGAGATCA 
GCTGGATATG 
CCTTTGAGTC 
GGGCAATAAG 
TATAAACAAA 
AGAGCTAAAA 
AGTATAATTT 
TATCCATGTA 
TAATCTAAAA 
GATTCTCAAA 
CAGTATAATG 
AGGGATACTC 
TCTTCAAGGT 
TTGGATTTTC 
TCTTTCACCT 
TATAAGCAAC 
TTTCACCACT 
ACTGCATTTT 
TTCTTG CTTT 
CCGGAGTTAT 
ATTTTAGGGA 
TAGGTAATGT 
TGAACTTATC 
CATTGGGGCC 
AATGTTTGAT 
ATTTTTAGTT 
TTGTGGTGTT 
CTTAGTTGGC 
GTGTCTATCT 
TGTCAACTTG 
GTTTGTGAGC 
CTGTCTTTCC 
AAGGCAGAGG 
CTCATCTGGT 



AATCCCTGAT 
CCTGGAAAGC 
AAGAATCAGT 
TATCTGAATG 
TTGCTGCCCA 
GGTTCAAGTG 
CACACCTGGC 
TTCCTCGAAC 
ACAGGTGTGA 
AGTCTGACAC 
CAGATTTCCT 
GCCTCAAGTT 
CTGAAACAGG 
CATTTTTCCT 
TGTTAATTTA 
TTAAAAAATT 
AAATGAGGAA 
TGTTAGCTTT 
ATGCTCATAA 
AATCCCAGCG 
GCCTGGGCAA 
GTGGTGCATG 
CAGGAGTTTG 
GTGAGAACTT 
ACTTTTGTTT 
AGTACTTAAA 
TTATCCAGAA 
ATTAGCTCCC 
ATTGGAAATT 
GGAGTGCTCA 
CAAACATTCC 
AACGTGTGTT 
AACCTCTATC 
AGGAAAGTTG 
AGCTTTCCCC 
TCACATTGAT 
GTTTCCACTA 
CTTGTCATAT 
TCATGACCTT 
AGATTTTTTG 
GAACATGATA 
TTCAGGTTTC 
AATTTTGTTT 
AAATCTTAGA 
ACATTCTAAA 
ACTTTTGTAT 
CCAGTACTAT 
AATATTTTTG 
TTTTGACAAA 
ACTGAGTCAG 
GTGTTTCTGG 
CAGTGTGGAT 
AAGGGGGAAT 
CTCCTGCTCT 



TCCCAAACAC 
TCTTACACAA 
CACCCAACAG 
TAATCTCCCA 
GGCTGGAGTG 
ATTCTCCTGC 
TAATTTTTGT 
TCCCAAACTC 
GCCACCATGT 
AAAGTGGATA 
GGTGCTCTCA 
TTTATCTGTA 
AAATACATAT 
GTTTCCCCAA 
ATTTAAAAGA 
ATCTCTGGAA 
CTAGAGAGCA * 
TTATTATTTT 
TAATAAGTTT 
CTTTGGGAAG 
CATAGTGAGA 
CTTGTACTCC 
AGGCTGCAGT 
GTCTCAAAAA 
CAAAATATGT 
AGTTAATAAC 
AATCATCCAT 
AGGTAATTAG 
CAAAATGCTC 
TGGAGTATTT 
AAATCTGAAA 
AGCTAATTAG 
CTCACTTCTA 
CAAAGATAGT 
CATTGTTAGG 
ACATGAAACT 
ATGTTTTCTT 
CTCCCTAGTC 
AACAGTCCTG 
AAATAATACC 
TCCACATGAC 
TCTACTGCAA 
TCTTCCATGA 
TCATGTAAAT 
AGATGTAATG 
AAGGTGTGAG 
TTGTTGCTAA 
TTGGTTTATT 
ACTGTTGATT 
GGGATAACCA 
ATGAGATTAG 
GGCATTATGC 
TTGGGCCTTT 
TGAACTGGGA 



GGTCTTTTCA 
ACACGCCCTC 
TGTCCTTGTC 
GAGGGTGTTA 
CAGTGGCATG 
CTCAGCCTCC 
ATTTTTAGTA 
AGGTGATCCA 
CCAGCCCCAT 
TAACAATATT 
AAGTTTTATG 
CTATGATTTC 
AACTGAAAAA 
CTTCATTTTC 
AGTAGTCTAC 
GGATACACAG 
' TGGCCAAGTG 
CTTTTGTAGG 
AAAATAAAAC 
CAGAGGTGGG 
CCCTGCCTCT 
TAGCTACTTG 
GAGCTATAAT 
AAAAAGGGGG 
AATATTTAGC 
TATTGTCTCC 
ATCAGCAAGC 
CAGGCAGCCT 
CAAAATCTGC 
CAGATTTTGG 
AAATCTGAAA 
ACCCTTCATG 
ATAGCATGAA 
ACAAAGACAG 
ATTTTACATT 
CTATTAACCA 
TCTGTTCCAA 
TTTTTTTGTC 
AAGATCATTT 
ACAAGGGCAA 
ATCACTGATA 
AGTGATTTTT 
CTAATACTTT 
TTTCTTCTAT 
TTTGATACAT 
AGATGTCTCC 
GACTATCTTT 
TCTAGACTGT 
ACAGTAAGCT 
GCTATCTGGT 
CCTTTGAATA 
CACCTGATAT 
TTTTCTGCCT 
TTTACATCAT 



TATACATTTT 
CCCTAGGAAG 
ACATCTTAGG 
TCATCTTTTT 
ATCTCGGCTC 
TGAGTAGCTG 
GAGACAGGGT 
CCCACCTCAG 
CTTTTTCTTT 
TTGAATTATG 
TTACAAAAGA 
AAAC CAAATA 
TTTTGGTATG 
TATAGCAATA 
CATCTCTTCT 
GGAACATTGC 
GGG TTTTGCT 
TTTGAATTTC 
TTTTGGCTGG 
AGGATACTTG 
GTAGAAATAA 
GGAGGTTGAG 
CACCCACTGC 
GGGGGAAACA 
ACTAAAGAAT 
TTTAAAAGAA 
TAAACTTTCT 
CTACTCAGGT 
AACTTTTTGA 
ATTTTTGGAT 
TACTTCTGGT 
GTCTCTTCTA 
CTTTTCTGTT 
TACAGGAGAG 
ATTATGATAC 
AACCCTAGAC 
GGTCCAATCT 
TGTGACAATG 
GCTTTTTTTT 
AGGGCCCTTC 
TTAACCTTCA 
TTCCCTTAAT 
TGTTATTATA 
ATTTTATTCT 
TACATCTAGT 
AGTTTCACTT 
TTTCCATTGA 
TTATCTCATT 
TTGAAATAGT 
TAAACATTAT 
GGTGATCCTA 
TCAGGGTCTG 
CACTGCTTGA 
CAGTTCCTCT 



CCACTGTACA 
CCTTTATAAA 
TTCTACACCT 
TTTTGAGATG 
ACAGCAACCT 
GGATTACAGA 
TTCACCGTGT 
CCTCCCAAAG 
TAGTTTAGTT 
AATAACTAAA 
AAAACAAGTC 
AAAAACAGGT 
TTAGTATGAT 
AAAAGAAACA 
GTTAAAAAGA 
TCTGGTTTCT 
TTTGTTTTTG 
AAACCACATA 
GTGCAATGAC 
AGGCCAGGAA 
ACAAAAATTA 
GCAGGAGGAT 
ACTATAGCAT 
AATAAATAAA 
TCTGAATTGT 
TTGTTATCAA 
CAAAATGACA 
TGAGTATTCC 
ATGCTAACAT 
TTGAGATACT 
TCTAAGCATA 
GACCTCAGCT 
TTAGAATAAT 
TTCCCATATA 
ATTTGTCAAA 
TTTATGTGGA 
GGAATACCAC 
TCTCAGTCTT 
CATAATTACA 
TTGTCACATC 
TCATGTGGTT 
TTAGCCCACC 
GCTAAAACTT 
AAAAGCTTGT 
CCTTTGATTT 
TATTAACACA 
TTACCTTTGC 
CCACTGATTT 
TCATTTTTTG 
TTCTGGCTGT 
GTAAAGTAAA 
AATAGAAGAA 
GCTGGGACAT 
GGTTCTCAGG 
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CCTTCAGATT 
AGATCATGGG 
AACTGCCCCA 
CAGATTTCCC 
AGTGCAGGGG 
GAAAGCTCTA 
ACATTACATT 
AGTCATTCTT 
AGAAGTACTT 
AAAAAGTTAA 
ATTTAGGAAT 
AATGGAGCAT 
AAAAGGAGCT 
TGTGTGTGTG 



CAGACTGAAT 
ACTCCTCATC 
CTGCAGATTA 
TTCATCCAGT 
TGGGGATGAG 
GAAGTGTTTG 
TTTCCCAGAA 
CCTGATTATC 
TTGGAACACA 
TGAAAAACTA 
TTGCCTTACC 
AGAATAAGAG 
ATAAAGCCTT 
TGTGTGTGTG 



CATACCACCA 
TTCCATAAAT 
AGGCTTTTTT 
GCCCTCTCCT 
GGTATAGTCC 
ATACATACAT 
AAAAAGGAAT 
AAAGGTAAAC 
AGGAATTCTC 
TAGTACCTTC 
AAGTAAAACA 
TAGTAAAGAA 
TAGGTATTTT 
TGTGTG 



GCTTTCCTGG 
GCATGAGCCA 
.CCACTAGGTG 
CTTTAAGTTA 
TCTTGTTTGC 
AAACAAGGCA 
GTATAGGCAT 
AGTTATTAAT 
TGGGAGTCCT 
CTATAAGCTG 
TAAGGGCAGC 
TGCCAAAAAT 
CACACTTGCT 



GTCTCCAGCT 
ATTCAGTCTA 
AAATAAAGAA 
CAACACATTG 
TGAGAAGAGA 
TGGTTTTTGC 
CACGTAACTG 
CCTATACCAA 
TACTACTCTC 
GATGACTAAT 
TGAGGTGCTG 
GCTGTCATGT 
CTGTTACGTA 



TGCAGATTAC 
TGTCCTTGAA 
GCTTGTTAGA 
GCTACACCTA 
ACTGTATTGG 
ACTTAATTTC 
TACTAGCTGG 
GATGTCAAGG 
AAGCCCAGTG 
TACCAGGCTC 
ACTGAAGACA 
ATCCATTGAC 
AATGTATGTG 
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CHIP-BASED SPECIATION AND PHENOTYPIC 
CHARACTERIZATION OF MICROORGANISMS 

This application is a continuation-in-part of and 

claims the benefit of the priority dates of USSN 60/011,339, 

filed 08 Feb. 1996; 60/012,631, filed 01 March 1996; 

08/629,031, filed 08 April 1996; and 60/017,765, filed 15 May 

1996, the disclosures of which are specifically incorporated 

by reference in their entirety for all purposes, 

BACKGROUND OF THE INVENTION 

Copyright Notice 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the xerographic 
reproduction by anyone of the patent document or the patent 
disclosure in exactly the form it appears in the Patent and 
Trademark Office patent file or records, but otherwise 
reserves all copyright rights whatsoever. 

Field of the Invention 

This invention relates to the identification and 
characterization of microorganisms. 
Background of the Invention 

Multidrug resistance and human immunodeficiency 
virus (HIV-1) infections are factors which have had a profound 
impact on the tuberculosis problem. An increase in the 
frequency of Mycobacterium tuberculosis strains resistant to 
one or more ant i -mycobacterial agents has been reported, 
Block, et al., (1994) JAMA 271:665-671, Immunocompromised 
HIV-1 infected patients not infected with M. tuberculosis are 
frequently infected with M. avium complex (MAC) or M. avium-M. 
intracellular (MAI) complex. These mycobacteria species are 
often resistant to the drugs used to treat M. tuberculosis. 
These factors have re-emphasized the importance for the 
accurate determination of drug sensitivities and mycobacteria 
species identification . 

In HIV-1 infected patients, the correct diagnosis of 
the mycobacterial disease is essential since treatment of M. 
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txiberculosis infections differs from that called for by other 
mycobacteria infections, Hoffner, S.E. (1994) Eur. J. Clin. 
Microbiol. Inf. Pis. 13:937-941. Non-tuberculosis 
mycobacteria commonly associated with HIV-1 infections include 
5 M. kansasii, M. xenopi, M. fortuitum, M. avium and AT. 
intracellular, Wolinsky, E., (1992) Clin. Infect. Pis. 
15:1-12, Shafer, R.W. and Sierra, M.F. 1992 Clin. Infect. Pis. 
15:161-162. Additionally, 13% of new cases (HIV-1 infected 
and non- infected) of M. tuberculosis are resistant to one of 

10 the primary anti -tuberculosis drugs (isoniazid [INH] , rifampin 
[RIF] , streptomycin [STR] , ethambutol [EMB] and pyrazinamide 
[PZA] and 3.2% are resistant to both RIF and INH, Block, et 
al., JAMA 271:665-671, (1994). Consequently, mycobacterial 
species identification and the determination of drug 

15 resistance have become central concerns during the diagnosis 
of mycobacterial diseases. 

Methods used to detect, and to identify 
Mycobacterium species vary considerably. For detection of 
Mycobacterium tuberculosis , microscopic examination of acid- 

20 fast stained smears and cultures are still the methods of 
choice in most microbiological clinical laboratories. 
However, culture of clinical samples is hampered by the slow 
growth of mycobacteria, A mean time of four weeks is required 
before sufficient growth is obtained to enable detection and 

25 possible identification. Recently, two more rapid methods for 
culture have been developed involving a radiometric, Stager, 
C.E. et al., (1991) J. Clin. Microbiol. 29:154-157, and a 
biphasic (broth/agar) system Sewell, et al., (1993) J. Clin. 
Microbiol . 29:2689-2472, Once grown, cultured mycobacteria 

30 can be analyzed by lipid composition, the use of species 

specific antibodies, species specific PNA or RNA probes and 
PCR-based sequence analysis of 16S rRNA gene (Schirm, et al. 
(1995) J. Clin. Microbiol. 33:3221-3224; Kox, et al . (1995) «L. 
Clin. Microbiol. 33:3225-3233) and IS6110 specific repetitive 

35 sequence analysis (For a review see, e.g., Small et al., P.M. 
and van Embden, J. P. A. (1994) Am. Society for Microbiology , 
pp. 569-582) . The analysis of 16S rRNA sequences (RNA and 
PNA) has been the most informative molecular approach to 
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identify Mycobacteria species (Jonas, et al . , J. Clin, 
Microbiol . 31:2410-2416 (1993))- However, to obtain drug 
sensitivity information for the same isolate, additional 
protocols (culture) or alternative gene analysis is necessary. 

To determine drug sensitivity information, culture 
methods are still the protocols of choice. Mycobacteria are 
judged to be resistant to particular drugs by use of either 
the standard proportional plate method or minimal inhibitory 
concentration (MIC) method. However, given the inherent 
lengthy times required by culture methods, approaches to 
determine drug sensitivity based on molecular genetics have 
been recently developed. 

Table 1 lists the M. tuberculosis genes with which 
when mutated have been shown to confer drug resistance (other 
genes are known, e.g., the pncA gene). Of the drugs listed in 
Table 1, RIF and INH form the backbone of tuberculosis 
treatment. Detection of RIF resistance in M. tuberculosis is 
important not only because of its clinical and epidemiological 
implications but also because it is a marker for the highly 
threatening multidrug resistant phenotype (Telenti, et al. 
(1993) The Lancet 341:547-650). Of the drug resistances 
listed in Table 1, decreased sensitivity to RIF is the best 
understood on a genetic basis. 

Table 1 

Af. tuberculosis Genes with Mutations Which Confer Drug Resistance 



Drug Gene Size (bp) Gene Product 

RIF rpoB 3,534 0-subunit of RNA polymerase 

INH kosG 2.205 catalase-peroxidase 

INH-ETH inhA 810 fatty and biosynthesis 

STR rpsL 372 ribosomal protein S 12 

rrs 1.464 16S rRNA 

FQ gyrA 2,517 DNA gyrase A subunit 



Because resistance to RIF in E. coli strains was 
observed to arise as a result of mutations in the rpoB gene. 
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Telenti, et al., id., identified a 69 base pair (bp) region of 
the M. tuberculosis rpoB gene as the locus where RIF resistant 
mutations were focused. Kapur, et al., (1995) Arch, Pathol. 
Lab . Med . 119:131-138, identified additional novel mutations 
5 in the M. tuberculosis rpoB gene which extended this core 

region to 81 bp. In a detailed review on antimicrobial agent 
resistance in mycobacteria, Musser ( Clin. Microbiol. Rev. , 
8:496-514 (1995)), summarized all the characterized mutations 
and their relative frequency of occurrence in this 81 bp 

10 region of rpoB. Missense mutations comprise 88% of all known 
mutations while insertions (3 or 6 bp) and deletions (3, 6 and 
9 bp) account for 4% and 8% of the remaining mutations, 
respectively. Approximately 90% of all RIF resistant 
tuberculosis isolates have been shown to have mutations in 

15 this 81 bp region. The remaining 10% are thought possibly to 
involve genes other than rpoB. 

For the above reasons, it would be desirable to have 
simpler methods which identify and characterize 
microorganisms, such as Mycobacteria, both at the phenotypic 

20 and genotypic level. This invention fulfills that and related 
needs . 

SUMMARY OF THE INVENTION 

The present invention provides systems, methods, and 
25 devices for characterizing and identifying organisms. In one 
aspect of the invention, a method for identifying a genotype 
of a first organism, comprising: 

(a) providing an array of oligonucleotides at known 
locations on a substrate, said array comprising probes 

30 complementary to reference DNA or RNA sequences from a second 
organism; 

(b) hybridizing a target nucleic acid sequence from 
the first organism to the array; and 

(c) based on an overall hybridization pattern of the 
3 5 target to the array, identifying the genotype of the first 

organism, and optionally identifying a phenotype of the first 
organism. 
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Another aspect of the invention provides a method 
for identifying the genotype and/or phenotype of an organism 
by comparing a target nucleic acid sequence from a first 
organism coding for a gene (or its complement) to a reference 
sequence coding for the same gene (or its complement) from a 
second organism, the method comprising: 

(a) hybridizing a sample comprising the target 
nucleic acid or a subsequence thereof to an array of 
oligonucleotide probes immobilized on a solid support, the 
array comprising: 

a first probe set comprising a plurality of probes, 
each probe comprising a segment of nucleotides exactly 
complementary to a subsequence of the reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence; 

(b) determining which probes in the first probe set 
bind to the target nucleic acid or subsequence thereof 
relative to their binding to the reference sequence, such 
relative binding indicating whether a nucleotide in the target 
sequence is the same or different from the corresponding 
nucleotide in the reference sequence; 

(c) based on differences between the nucleotides of 
the target sequence and the reference sequence identifying the 
phenotype of the first organism; 

(d) deriving one or more sets of differences 
between the reference sequence and the first organism; and 

(e) comparing the set of differences to a data base 
comprising sets of differences correlated with speciation of 
organisms to identify the genotype of the first organism. 

Another aspect of the invention provides a method 
for identifying the genotype and/or phenotype of an organism 
by comparing a target nucleic acid sequence from a first 
organism coding for a gene (or its complement) to a reference 
sequence coding for the same gene (or its complement) from a 
second organism, the method comprising: 

(a) hybridizing a sample comprising the target 
nucleic acid or a subsequence thereof to an array of 
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oligonucleotide probes immobilized on a solid support, the 
array comprising: 

a first probe set comprising a plurality of probes, 
each probe comprising a segment of nucleotides exactly 
complementary to a subsequence of the reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence, wherein each interrogation position corresponds to a 
nucleotide position in the reference or target sequence; 

(b) determining a hybridization intensity from each 

probe ; 

(c) plotting the hybridization intensities versus 
the nucleotide position corresponding to the probe from which 
the hybridization intensity was determined to derive a target 
plot of hybridization intensity; 

(d) repeating steps (a) - (c) with the target 
sequence replaced by the reference sequence, to derive a 
baseline plot of the reference sequence; and 

(e) comparing the target plot to the baseline plot 
to identify the genotype and/or phenotype of the organism. 

Another aspect of the invention provides an array of 
oligonucleotide probes immobilized on a solid support, the 
array comprising: 

a first probe set comprising a plurality of probes, 
each probe comprising a segment of nucleotides exactly 
complementary to a subsequence of a reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence; 

wherein the reference sequence is a gene from 
Mycobacterium tuberculosis. 

Another aspect of the invention provides a method of 
identifying the presence of a nucleic acid polymorphism in a 
patient sample, comprising the steps of: 

(a) determining the difference between the 
hybridization intensities of a nucleic acid sequence from the 
patient sample and a corresponding nucleic acid sequence from 
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a wild type sample to an array of reference nucleic acid 
probes; 

(b) deriving ratios of the difference in (a) to the 
hybridization intensity of the wild type sample for each base 

5 position corresponding to each reference nucleic acid probe; 
and 

(c) identifying the presence of a polymorphism at a 
base position corresponding to a reference probe if the ratio 
in (b) for the base position corresponding to the reference 

10 probe is greater than or equal to an assigned value. 

Another aspect of the invention provides a computer 
program product that identifies the presence of a nucleic acid 
polymorphism in a patient sample, comprising: 

computer code that determines the difference between 
15 the hybridization intensities of a nucleic acid sequence from 
the patient sample and a corresponding nucleic acid sequence 
from a wild type sample to an array of reference nucleic acid 
probes; 

computer code that derives ratios of the difference 
2 0 to the hybridization intensity of the wild type sample for 
each base position corresponding to each reference nucleic 
acid probe; 

computer code that identifies the presence of a 
polymorphism at a base position corresponding to a reference 
2 5 probe if the ratio for the base position corresponding to the 
reference probe is greater than or equal to an assigned value; 
and 

a computer readable medium that stores the computer 

codes . 

30 Another aspect of the invention provides , in a 

computer system, a method of assigning an organism to a group, 
comprising the steps of: 

inputting groups of a plurality of known nucleic 
acid sequences, the plurality of known nucleic acid sequences 
2 5 being from known organisms; 

inputting hybridization patterns for the plurality 
of known nucleic acid sequences, each hybridization pattern 
indicating hybridization of subsequences of the known nucleic 
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acid sequence to subsequences of a reference nucleic acid 
sequence; 

inputting a hybridization pattern for a sample 
nucleic acid sequence from the organism indicating 
hybridization of subsequences of the sample nucleic acid 
sequence to subsequences of the reference nucleic acid 
sequence; 

comparing the hybridization pattern for the sample 
nucleic acid sequence to the hybridization patterns for the 
plurality of known nucleic acid sequences; and 

assigning a particular group to which the organism 
belongs according to the group of at least one of the known 
nucleic acid sequences that has a hybridization pattern that 
most closely matches the hybridization pattern of the sample 
nucleic acid sequence at specific locations. 

Another aspect of the invention provides a computer 
program product that assigns an organism to a group, 
comprising: 

computer code that receives as input groups of a 
plurality of known nucleic acid sequences, the plurality of 
known nucleic acid sequences being from known organisms; 

computer code that receives as input hybridization 
patterns for the plurality of known nucleic acid sequences, 
each hybridization pattern indicating hybridization of 
subsequences of the known nucleic acid sequence to 
subsequences of a reference nucleic acid sequence ; 

computer code that receives as input a hybridization 
pattern for a sample nucleic acid sequence from the organism 
indicating hybridization of subsequences of the sample nucleic 
acid sequence to subsequences of the reference nucleic acid 
sequence; 

computer code that compares the hybridization 
pattern for the sample nucleic acid sequence to the 
hybridization patterns for the plurality of known nucleic acid 
sequences; 

computer code that assigns a particular group to 
which the organism belongs according to the groups of at least 
one of the known nucleic acid sequences that has a 
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hybridization pattern that most closely matches the 
hybridization pattern of the sample nucleic acid sequence at 
specific locations; and 

a computer readable medium that stores the computer 

codes . 

Another aspect of the invention provides, in a 
computer system, a method of assigning groups to which 
organisms belong utilizing a generic probe array, comprising 
the steps of : 

inputting hybridization intensities for a plurality 
of isolates, the hybridization intensities indicating 
hybridization affinity between the isolate and the generic 
probe array; 

selecting hybridization intensities that have the 
most variance across the plurality of isolates; and 

assigning each of the plurality of isolates to a 
group according to the selected hybridization intensities. 

Another aspect of the invention provides a computer 
program product that assigns groups to which organisms belong 
utilizing a generic probe array, comprising the steps of: 

computer code that receives as input hybridization 
intensities for a plurality of isolates, the hybridization 
intensities indicating hybridization affinity between the 
isolate and the generic probe array; 

computer code that selects hybridization intensities 
that have the most variance across the plurality of isolates; 

computer code that assigns a group to each of the 
plurality of isolates according to the selected hybridization 
intensities; and 

a computer readable medium that stores the computer 

codes . 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. l: Basic tiling strategy. The figure 
illustrates the relationship between an interrogation position 
(I) and a corresponding nucleotide (n) in the reference 
sequence, and between a probe from the first probe set and 
corresponding probes from second, third and fourth probe sets. 
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Fig, 2: Segment of complementarity in a probe from 
the first probe set. 

Fig. 3: Incremental succession of probes in a basic 
tiling strategy. The figure shows four probe sets, each 
5 having three probes. Note that each probe differs from its 
predecessor in the same set by the acquisition of a 5' 
nucleotide and the loss of a 3' nucleotide, as well as in the 
nucleotide occupying the interrogation position. 

Fig. 4: Exemplary arrangement of lanes on a chip. 
10 The chip shows four probe sets, each having five probes and 
each having a total of five interrogation positions (11-15), 
one per probe. 

Fig. 5: Strategies for detecting deletion and 
insertion mutations. Bases in brackets may or may not be 
15 present. 

Fig. 6: Shows the light directed synthesis of 
oligonucleotide probes on a substrate. 

Fig. 7: Shows the synthesis of a combinatorial 
array all possible tetranucleotide oligomers on a chip. 
20 Fig. 8: A schematic diagram of target preparation. 

Fig. 9: A tiling strategy for sequence 
determination . 

Fig. 10: A mismatch profile for an octamer based 

chip . 

25 Fig. 11: A hypothetical six-class tree based 

classification system. The numbers underneath the terminal 
nodes are the class assignments as determined by this 
classifier. 

Fig. 12: An image of the Mtb rpoB chip analysis of 
30 the 700 bp region of the rpoB gene from an Af. tuberculosis 
isolate. 

Fig. 13: Shared single nucleotide polymorphisms of 
seven Mycobacterium species. 

Fig. 14: Unique (species-specific) single 
35 nucleotide polymorphisms of seven Mycobacterium species. 

Fig. 15A and 15B: Hybridization patterns and bar 
code fingerprint representations of seven Mycobacterium 
species . 
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Fig. 16: Bar code fingerprint representations of 
seven M. gordonae clinical isolates and the core fingerprint 
derived therefrom. 

Fig. 17: Plot of hybridization intensity vs. 
nucleotide position using M. gordonae as target on an 
Mycobacterium tuberculosis rpoB chip. The bottom panel shows 
the sequences of the rpoB genes of M. tuberculosis and Af. 
gordonae with the position of difference outlined in black. 

Fig. 18: Plot of hybridization intensity vs. 
nucleotide position using other Mycobacterium species as 
target on an Mycobacterium tuberculosis rpoB chip. 

Fig. 19: Plots of hybridization intensity vs. 
nucleotide position using Mycobacterium species as target on 
an Mycobacterium tuberculosis rpoB chip overlayed on the 
corresponding plot for Mycobacterium tuberculosis . 

Fig. 2OA-20D and 21: Plots of hybridization 
intensity vs. nucleotide position of an unknown patient sample 
compared to plots of known Mycobacterium species as target on 
an Mycobacterium tuberculosis rpoB chip. 

Fig. 22: Plots of hybridization intensity vs. 
nucleotide position of Mycobacterium gordonae isolates as 
target on an Mycobacterium tuberculosis rpoB chip compared to 
a reference ATCC isolate. 

Fig. 23 (A, B) : Design of a tiled array. 

Fig. 24: Effect and positional dependence of a 
single base mismatch on hybrid stability using the MT1 DNA 
chip. The sequences of the perfect match probe and each A: A 
single base mismatch probe are shown. The results of five 
independent experiments are plotted* 

Fig. 25 (A, B, C) : Detection of base differences in 
a 2.5 kb region of human mitochondrial DNA between a sample 
and reference target by comparison of scaled p° hybridization 
intensity patterns. 

Fig. 2 6 (A, B) : Detection of deletion sequences of 
human mitochondrial DNA. 

Fig. 27 (A, B, C) : Hybridization of 16.3 kb of a 
mitochondrial target to chip with the entire mitochondrial 
genome . 
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Fig, 28 (A, B) : (A) Overlay of hybridization 
intensities of Exon 12 of the MSH2 gene from a patient sample 
and from a wild type sample. (B) Plot of hybridization 
intensity differences greater than 0,25 between patient sample 
5 and wild type sample as a function of base position. 

Fig. 29: Plot of hybridization intensity 
differences 

greater than 0.25 between patient sample and wild type sample 
as a function of base position for Exon 13 and Exon 16 of the 
10 MLH1 gene. 

Fig. 30: Plot of hybridization intensity 
differences 

greater than 0.25 between patient sample and wild type sample 
as a function of base position for Exon 12 of the MSH2 gene. 
15 Fig. 31: Plot of hybridization intensity 

differences greater than 0.25 between patient sample and wild 
type sample as a function of base position for Exon 5 of the 
p53 gene. 

Fig. 32: Computer that may be utilized to execute 
20 software embodiments of the present invention. 

Fig. 33: A system block diagram of a typical 
computer system that may be used to execute software 
embodiments of the invention. 

Fig. 34: A high level flowchart of identifying the 
25 presence of a polymorphism in a nucleic acid sequence from a 
pa t ient sample . 

Fig. 35: A high level flowchart of a method of 
identifying a species within a genus to which an organism 
belongs . 

30 Fig. 36: A high level flowchart of a method of 

identifying species within a genus to which organisms belong. 

Fig. 37: A hierarchical clustering of isolates of 
Mycoba c t eri um . 

35 DESCRIPTION OF THE PREFERRED EMBODIMENT 

This invention provides methods/ compositions and 
devices for identifying the group or species of an organism 
and obtaining functional phenotypic information about the 
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organism based on genotypic analysis of one or more genomic 
regions of the organism. In one embodiment, the method 
compares a target nucleic acid sequence from the organism 
coding for a gene (or its complement) to a reference sequence 
5 coding for the same gene (or its complement) . 

In principle, a reference sequence from any genomic 
region of the organism can be used. When phenotypes are being 
identified, it will be understood by one of skill in the art 
that mutations within that region will affect the phenotypic 

10 trait which is being characterized. Genotyping, by contrast, 
only requires that a polymorphism, which may or may not code 
for a mutation, be present. The reference sequence can be 
from a highly polymorphic region, a region of intermediate 
polymorphic complexity or in some cases, a highly conserved 

15 region. Highly polymorphic regions are typically more 
informative when doing speciation analysis. The method 
disclosed herein is readily applicable to using reference 
sequences from highly polymorphic regions, though in certain 
cases one may prefer to use a reference sequence from a highly 

20 conserved region within the organism, since this reduces the 
mathematical complexity of the deconvolution required of the 
overall hybridization patterns observed during the analysis. 
In this context, a "highly conserved region" of a organism 
refers to a degree of conservation at the genotypic level of 

25 greater than 50%, preferably greater than 75%, and more 

preferably greater than 90%- A particularly useful reference 
sequence is the 700 bp rpoB gene from Mycobacterium 
tuberculosis (Mt) , since it is well defined. Other useful 
reference sequences include 16SrRNA, rpoB gene, katG gene, 

30 inhA gene, gyrA gene, 23SnRNA gene, rrs gene, pncA gene, and 
rpsL gene. Furthermore, an 81 bp segment within this gene 
contains all the known mutations which code for rifampacin 
resistance in M. tuberculosis. The invention is particularly 
useful for phenotypic and genotypic characterization of 

35 microorganisms. In this context, the term "microorganism" 
refers to bacteria, fungi, protozoa or viruses. 

The invention finds particular utility in assaying 
biological samples. The term "biological sample", as used 
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herein, refers to a sample obtained from an organism or from 
components (e.g., cells) of an organism. The sample may be of 
any biological tissue or fluid. Frequently the sample will be 
a "clinical sample" which is a sample derived from a patient. 
5 Such samples include, but are not limited to, sputum, blood, 
blood cells (e.g., white cells), tissue or fine needle biopsy 
samples, urine, peritoneal fluid, and pleural fluid, or cells 
therefrom. Biological samples may also include sections of 
tissues such as frozen sections taken for histological 
JO purposes. 

Frequently, it is desirable to amplify the nucleic 
acid sample prior to hybridization. Suitable amplification 
methods include, but are not limited to polymerase chain 
reaction (PCR) (Innis, et al . , PCR Protocols. A guide to 

15 Methods and Application. Academic Press, Inc. San Diego, 
(1990) ) , ligase chain reaction (LCR) (see Wu and Wallace, 
Genomics, 4: 560 (1989), Landegren, et al . , Science, 241: 1077 
(1988) and Barringer, et al., Gene, 89: 117 (1990), 
transcription amplification (Kwoh, et al., Proc. Natl. Acad. 

20 Sci. USA, 86: 1173 (1989)), and self -sustained sequence 

replication (Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87: 
1874 (1990) ) . 

In a preferred embodiment, the hybridized nucleic 
acids are detected by detecting one or more labels attached to 

25 the sample nucleic acids. The labels may be incorporated by 
any of a number of means well known to those of skill in the 
" art. However, in a preferred embodiment, the label is 
simultaneously incorporated during the amplification step in 
the preparation of the sample nucleic acids. Thus, for 

30 example, polymerase chain reaction (PCR) with labeled primers 
or labeled nucleotides will provide a labeled amplification 
product. In a preferred embodiment, transcription 
amplification, as described above, using a labeled nucleotide 
(e.g. fluorescein- labeled UTP and/or CTP) incorporates a label 

35 into the transcribed nucleic acids. 

Alternatively, a label may be added directly to the 
original nucleic acid sample (e.g., mRNA, polyA mRNA, cDNA, 
etc.) or to the amplification product after the amplification 
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is completed. Means of attaching labels to nucleic acids are 
well known to those of skill in the art and include, for 
example nick translation or end-labeling (e.g. with a labeled 
RNA) by kinasing of the nucleic acid and subsequent attachment 
5 (ligation) of a nucleic acid linker joining the sample nucleic 
acid to a label {e.g., a fluorophore) . 

Detectable labels suitable for use in the present 
invention include any composition detectable by spectroscopic, 
photochemical , biochemical , immunochemical , electrical , 

10 optical or chemical means. Useful labels in the present 
invention include biotin for staining with labeled 
streptavidin conjugate, magnetic beads (e.g., DynabeadsTM) , 
fluorescent dyes (e.g., fluorescein, texas red, rhodamine, 
green fluorescent protein, and the like), radiolabels (e.g., 

15 3 H, 125 I, 35 S, 14 C, or 32 P) , enzymes (e.g., horse radish 

peroxidase, alkaline phosphatase and others commonly used in 
an ELISA) , and colorimetric labels such as colloidal gold or 
colored glass or plastic (e.g., polystyrene, polypropylene, 
latex, etc.) beads. Patents teaching the use of such labels 

20 include U.S. Patent Nos . 3,817,837; 3,850,752; 3,939,350; 
3,996,345; 4,277,437; 4,275,149; and 4,366,241. 

An oligonucleotide probe array complementary to the 
reference sequence or subsequence thereof is immobilized on a 
solid support using one of the display strategies described 

25 below. For the purposes of clarity, much of the following 
description of the invention will use probe arrays derived 
from the Mycobacterium rpoB gene as an example; however it 
should be recognized, as described previously, that probe 
arrays derived from other genes may also be used, depending on 

30 the phenotypic trait being monitored, the availability of 
suitable primers and the like. 

Initially, target nucleic acids derived from 
Mycobacterium species having rpoB genes of known sequence and 
known drug resistance mutations are screened against a solid 

35 phase probe array derived from sequences complementary to the 
Mycobacterium tuberculosis rpoB gene (the Mtb rpoB chip) . The 
known sequences are either available from the literature or 
can be independently established by another method, such as 
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dideoxynucleotide sequencing. The overall hybridization 
pattern observed with each these species is compared to the 
overall hybridization pattern observed with Mycobacterium 
tuberculosis and differences between the two hybridization 
5 patterns are derived. A sample derived from the Mycobacterium 
tuberculosis (Mt) used as the reference sequence, being 
exactly complementary to the probe set(s) on the solid 
support, will bind to all the probes. Samples derived from 
other organisms, which contain one or more polymorphisms at 

10 the genotypic level, will not display similar binding. The 

observed patterns will vary as a function of the variation in 
the sequences of the rpdB genes of the individual species. 
Subsequences identical to Mt will generate hybridization 
subpat terns identical to the subpattern observed with Mt for 

15 that corresponding subsequence. Subsequences which differ 

from Mt will generate hybridization subpatterns which differ 
from the Mt subpattern for that corresponding subsequence. 
Thus, the overall hybridization pattern observed with a 
particular species allows one to identify regions of the rpoB 

20 gene of that species which differ from that of Mt. 

The presence of a different hybridization pattern in 
a specified region of the substrate can be correlated with a 
probability that the target nucleic acid is from a specific 
species. In the idealized case, the differential 

25 hybridization pattern in a single region will allow species 

identification. This can occur when one or more polymorphisms 
' in that region are uniquely associated with a specific 
species. More frequently however, such an unique one-to-one 
correspondence is not present. Instead, differential 

30 hybridization patterns (i.e., relative to the reference 

sequence) are observed in multiple regions, none of which will 
bear an unique correspondence to a particular species. 
However, each differential hybridization pattern will be 
associated with a probability of the organism being screened 

35 belonging to a particular species (or not) or carrying a 

particular phenotypic . trait (or not). As a result, detection 
of an increasing number of these sets of differences allows 
one to classify the organism with an increasing level of 
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confidence. Other algorithms can be used to derive such 
composite probabilities from the detection of multiple sets of 
differences. Therefore, the overall hybridization pattern, 
which is the aggregate of all the differential hybridizations 
5 observed at all regions of the substrate, allows one to assign 
with high confidence, the speciation and/or phenotype of the 
organism. 

When a single probe set is being used on the 
substrate, it will usually not be able to define the 

10 differences in sequence between the target and the reference 
sequence, absent additional knowledge about the target. 
Multiple probe sets can be used, as with the tiling strategies 
disclosed herein and described in more detail in PCT 
publication W095/11995. In some cases, the differences will 

15 be definable, i.e., the different nucleotide responsible for 
the different hybridization pattern will be known. In other 
cases the difference will not be definable, i.e., all one will 
know is that a polymorphism is present in that region. 
However, this is primarily a function of the probe array used 

2 0 on the chip. If necessary the sample can be screened against 
a different probe array to assign the polymorphism present in 
that region. Since the point mutations which confer 
antibiotic resistance, for example to rifampacin, for Mt are 
frequently known, the presence of a change in the 

25 hybridization pattern in the region where the point mutation 

occurs signals the presence of a rifampacin resistant species. 
It will be apparent that this technique is not limited to 
identifying drug resistance. Any phenotypic trait whose 
variation has been mapped to mutations is a particular genomic 

30 region can be identified by this method. Representative and 
nonlimiting examples include the presence of toxin and 
pathogenic markers . 

It is important to recognize that this method 
provides more than the ability to identify genotypic 

35 variations and thus phenotype by hybridization. Analysis of 
hybridization patterns of a single genomic region of a 
microorganism with the Mt rpoB chip also provides, as 
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explained below, a method of identifying the species of the 
microorganism . 

This chip based screening method allows one to build 
up a data base of hybridization patterns corresponding to 
5 different species. Some regions of the hybridization pattern 
will be shared among subsets of the species because their 
sequences in regions corresponding to those hybridizations are 
identical. Other regions of the hybridization pattern will 
differ between two species because the sequence corresponding 

10 to those hybridizations are different. In all cases, the 

sequences of the rpoB gene of the unknown species are being 
compared to the corresponding sequence of Mt. Differences in 
the hybridization pattern of a particular species to the 
pattern observed with Mt as sample, can be correlated to the 

15 presence of a polymorphism at a particular point in the 
sequence of that species. Some polymorphisms will be 
definable, i.e., one will know not only that the nucleotide at 
that position differs from that of Mt, but one will also know 
the identity of that nucleotide. Some polymorphisms will be 

20 unique to the particular species, i.e., species-specific; they 
will be present in that particular species and not in any 
other species. Other polymorphisms will be shared, i.e., they 
will not be unique to a particular species. Certain subsets 
of species will have the polymorphism and others will not . 

25 However, each of these polymorphisms can be assigned to its 
particular subset of species. Therefore, the presence of a 
shared polymorphism, despite not indicating with certainty 
that the sample being screened contains a particular species, 
increases the probability that one species of that particular 

30 subset of species is present. 

The hybridization pattern of a particular sample can 
be represented as a "bar code" in which the individual lines 
of the bar code indicate the presence of a polymorphism 
relative to Mt. This invention provides a method of screening 

35 large numbers of individual species and thus deriving 

information on the polymorphisms present in those species. 
Each individual line can be assigned a probability of being 
associated with different species. In this fashion, a data 
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base can be built up in which increasing numbers of 
polymorphisms can be associated with the different species. 
As one will recognize, the presence of an unique species - 
specific polymorphism will allow the immediate identification 
5 of a sample as being a particular species. However, even the 
presence of shared polymorphisms among several species will 
allow species identification. In the simplest case, each 
species can be assigned a "fingerprint" of shared 
polymorphisms, i.e., that species and isolates of that species 

10 will possess a particular collection of shared polymorphisms. 
However, it is not necessary for one to be able to assign an 
unique "fingerprint" pattern of shared polymorphisms to a 
species in order to be able to identify that species. As long 
as one can correlate the presence of a particular polymorphism 

15 or subset of polymorphisms with a probability of the sample 
being a particular species, the detection of increasing 
numbers of such polymorphisms allows one to predict with 
increasing probability the speciation of the sample, i.e, as 
one observes more and more such polymorphisms associated with 

20 a particular species, the confidence level of the sample being 
that particular species increases. Standard mathematical 
algorithms can be used to make this prediction. Therefore, 
once the data base is sufficiently large, the lack of an 
"unique" fingerprint for a species becomes irrelevant. 

25 Typically, the mathematical algorithm will make a call of the 
identity of the species and assign a confidence level to that 
call. One can determine the confidence level (>90%, >95% 
etc.) that one desires and the algorithm will analyze the 
hybridization pattern and either provide an identification or 

30 not. Occasionally, the call may be that the sample may be one 
of two, three or more species, in which case a specific 
identification will not be possible. However, one of the 
strengths of this technique is that the rapid screening made 
possible by the chip-based hybridization allows one to 

35 continuously expand the data base of patterns and 

polymorphisms to ultimately enable the identification of 
species previously unidentifiable due to lack of sufficient 
information. 
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Analysis of an increasing number of isolates of a 
known species will allow one to build up a fingerprint that is 
characteristic of that species. However, it is important to 
note that as the total number of analyzed isolates for each 
5 species is increased, it is unlikely that a single and unique 
core fingerprint will define a Mycobacterium species. Rather, 
it is expected that any particular isolate of a Mycobacterium 
species will have a subset of all possible fingerprints. 
Identification of the Mycobacterium species based on a 

10 fingerprint pattern will require a classification analysis, 
such as by using a tree-based classification algorithm as 
described below, built upon a collected database consisting of 
species specific and shared single nucleotide polymorphisms 
(SNPs) and fingerprints. Thus, the chip-based method of 

15 determining hybridization patterns disclosed herein allows one 
to both build up a data base of polymorphisms associated with 
a particular species and use that data base to identify the 
speciation and phenotypic characteristic of an unknown sample 
from a single hybridization experiment. 

20 It should also be recognized that since this 

technique rests on differences in hybridization patterns, this 
method of speciation does not rest on knowing the actual 
identity of the polymorphism. The hybridization pattern 
relative to Mt will differ as long as a nucleotide at a 

25 particular point in the sequence differs from that of Mt . The 
exact nature of the substitution, insertion or deletion, e.g., 
A to T # C or G is less important than the fact that the 
nucleotide is not A (assuming for the purposes of illustration 
that Mt carries an A at that position) . It is not necessary 

30 that the sample be sequenced in order to identify its 
speciation. 

A second layer of confidence can be added to the 
initial determination by analyzing whether the differences in 
hybridization patterns are shared or unique. If the species 
35 identified is supposed to have either a shared or unique 

polymorphism at a particular site and the chip has in fact 
detected such a polymorphism, then one can be more confident 
in the initial determination. 
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Both identification and phenotyping can be 
accomplished based on genotypic determinations of a single 
region of the mycobacteria genome in place of analysis of two 
genomic regions (the rpoB and the 16S rRNA genes) . Two 
5 generic implications can derived from the successful 

demonstration of the use of high density oligonucleotide 
arrays for mycobacteria identification and antibiotic drug 
sensitivity. First, other genes affecting drug sensitivity 
can be encoded on the high density oligonucleotide arrays (see 

10 Table 1) and hybridization patterns for each of these 

additional genes can be used to confirm and provide confidence 
measurements for fingerprints derived from the rpoB gene. 
Second, the same chip-based strategy could be employed for 
other eubacteria species which simultaneously could provide 

15 genotypic information concerning important clinical phenotypes 
(e.g., toxin and pathogen marker genes) as well as 
identification information. 

The preceding discussion has used the Mt rpoB gene 
as an example. It should be recognized that this method is 

20 generally applicable to other microorganisms and reference 
sequences derived from other genomic regions, such as, for 
example, the human mitochondrial DNA sequence and the MT DNA 
sequence. The length of a reference sequence can vary widely 
from a full-length genome, to an individual chromosome, 

25 episome, gene, component of a gene, such as an exon or 

regulatory sequences, to a few nucleotides. A reference 
* sequence of between about 2, 5, 10, 20, 50, 100, 5000, 1000, 
5,000 or 10,000, 20,000 or 100,000 nucleotides is common. 
Sometimes only particular regions of a sequence are of 

30 interest. In such situations, the particular regions can be 
considered as separate reference sequences or can be 
considered as components of a single reference sequence, i.e., 
the microbial genome. 

A reference sequence can be any naturally occurring, 

35 mutant, consensus sequence of nucleotides, RNA or DNA. For 
example, sequences can be obtained from computer data bases, 
publications or can be determined or conceived de novo. 
Usually, a reference sequence is selected to show a high 
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degree of sequence identity to envisaged target sequences. 
Often, particularly, where a significant degree of divergence 
is anticipated between target sequences, more than one 
reference sequence is selected. Combinations of wild- type and 
mutant reference sequences are employed in several 
applications of the tiling strategy. 

Fig. 32 illustrates an example of a computer system 
that may be used to execute software embodiments of the 
present invention. Fig. 32 shows a computer system 100 which 
includes a monitor 102, screen 104, cabinet 106, keyboard 108, 
and mouse 110. Mouse 110 may have one or more buttons such as 
mouse buttons 112. Cabinet 106 houses a CD-ROM drive 114, a 
system memory and a hard drive (see Fig. 33) which may be 
utilized to store and retrieve software programs incorporating 
code that implements the present invention, data for use with 
the present invention, and the like. Although a CD-ROM 116 is 
shown as an exemplary computer readable storage medium, other 
computer readable storage media including floppy disks, tape, 
flash memory, system memory, and hard drives may be utilized. 
Cabinet 106 also houses familiar computer components such as a 
central processor, system memory, hard disk, and the like. 

Fig. 33 shows a system block diagram of computer 
system 100 that may be used to execute software embodiments of 
the present invention. As in Fig. 32, computer system 100 
includes monitor 102 and keyboard 108. Computer system 100 
further includes subsystems such as a central processor 102, 
system memory 120, I/O controller 122, display adapter 124, 
removable disk 126 (e.g., CD-ROM drive), fixed disk 128 (e.g., 
hard drive), network interface 130, and speaker 132. Other 
computer systems suitable for use with the present invention 
may include additional or fewer subsystems. For example, 
another computer system could include more than one processor 
102 (i.e., a multi-processor system) or a cache memory. 

Arrows such as 134 represent the system bus 
architecture of computer system 100. However, these arrows 
are illustrative of any interconnection scheme serving to link 
the subsystems. For example, a local bus could be utilized to 
connect the central processor to the system memory and display 
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adapter. Computer system 100 shown in Fig. 33 is but an 
example of a computer system suitable for use with the present 
invention. Other configurations of subsystems suitable for 
use with the present invention will be readily apparent to one 
5 of ordinary skill in the art. 

The methods of this invention employ oligonucleotide 
arrays which comprise probes exhibiting complementarity to one 
or more selected reference sequences whose sequence is known. 
Typically, these arrays are immobilized in a high density 

10 array ("DNA on chip") on a solid surface as described in U.S. 
Patent No. 5,143,854 and PCT patent publication Nos . WO 
90/15070, WO 92/10092 and WO 95/11995, each of which is 
incorporated herein by reference. 

Various strategies are available to order and 

15 display the oligonucleotide probe arrays on the chip and 
thereby maximize the hybridization pattern and sequence 
information derivable regarding the target nucleic acid. 
Exemplary display and ordering strategies are described in PCT 
patent publication No. WO 94/12305, incorporated herein by 

20 reference. For the purposes of fuller description, a brief 
description of the basic strategy is described below. 

The basic tiling strategy provides an array of 
immobilized probes for analysis of target sequences showing a 
high degree of sequence identity to one or more selected 

25 reference sequences. The strategy is illustrated for an array 
that is subdivided into four probe sets, although it will be 
' apparent that satisfactory results are obtained from one probe 
set (i.e., a probe set complementary to the reference sequence 
as described earlier) . 

30 A first probe set comprises a plurality of probes 

exhibiting perfect complementarity with a selected reference 
sequence. The perfect complementarity usually exists 
throughout the length of the probe. However, probes having a 
segment or segments of perfect complementarity that is/are 

35 flanked by leading or trailing sequences lacking 

complementarity to the reference sequence can also be used. 
Within a segment of complementarity, each probe in the first 
probe set has at least one interrogation position that 



WO 97/29212 PCT/US97/02102 

24 

corresponds to a nucleotide in the reference sequence. That 
is, the interrogation position is aligned with the 
corresponding nucleotide in the reference sequence, when the 
probe and reference sequence are aligned to maximize 
5 complementarity between the two. If a probe has more than one 
interrogation position, each corresponds with a respective 
nucleotide in the reference sequence. The identity of an 
interrogation position and corresponding nucleotide in a 
particular probe in the first probe set cannot be determined 

10 simply by inspection of the probe in the first set. As will 
become apparent, an interrogation position and corresponding 
nucleotide is defined by the comparative structures of probes 
in the first probe set and corresponding probes from 
additional probe sets. 

15 In principle, a probe could have an interrogation 

position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions 
provide more accurate data when located away from the ends of 
a segment of complementarity. Thus, typically a probe having 

2 0 a segment of complementarity of length x does not contain more 
than x-2 interrogation positions. Since probes are typically 
9-21 nucleotides, and usually all of a probe is complementary, 
a probe typically has 1-19 interrogation positions. Often the 
probes contain a single interrogation position, at or near the 

25 center of probe. 

For each probe in the first set, there are, for 
purposes of the present illustration, up to three 
corresponding probes from three additional probe sets. Fig. 1 
illustrates the basic "tiling" strategy of the invention. 

30 Thus, there are four probes corresponding to each nucleotide 
of interest in the reference sequence. Each of the four 
corresponding probes has an interrogation position aligned 
with that nucleotide of interest. Usually, the probes from 
the three additional probe sets are identical to the 

35 corresponding probe from the first probe set with one 

exception. The exception is that at least one (and often only 
one) interrogation position, which occurs in the same position 
in each of the four corresponding probes from the four probe 
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sets, is occupied by a different nucleotide in the four probe 
sets. For example, for an A nucleotide in the reference 
sequence, the corresponding probe from the first probe set has 
its interrogation position occupied by a T, and the 
corresponding probes from the additional three probe sets have 
their respective interrogation positions occupied by A, C, or 
G, a different nucleotide in each probe. Of course, if a 
probe from the first probe set comprises trailing or flanking 
sequences lacking complementarity to the reference sequences 
(see Fig. 2) , these sequences need not be present in 
corresponding probes from the three additional sets. Likewise 
corresponding probes from the three additional sets can 
contain leading or trailing sequences outside the segment of 
complementarity that are not present in the corresponding 
probe from the first probe set. Occasionally, the probes from 
the additional three probe set are identical (with the 
exception of interrogation position(s)) to a contiguous 
subsequence of the full complementary segment of the 
corresponding probe from the first probe set. In this case, 
the subsequence includes the interrogation position and 
usually differs from the full-length probe only in the 
omission of one or both terminal nucleotides from the termini 
of a segment of complementarity. That is, if a probe from the 
first probe set has a segment of complementarity of length n, 
corresponding probes from the other sets will usually include 
a subsequence of the segment of at least length n-2. Thus, 
the subsequence is usually at least 3, 4, 7, 9, 15, 21, or 25 
nucleotides long, most typically, in the range of 9-21 
nucleotides. The subsequence should be sufficiently long or 
hybridization conditions such to allow a probe to hybridize 
detectably more strongly to a variant of the reference 
sequence mutated at the interrogation position than to the 
reference sequence. 

The probes can be oligodeoxyribonucleotides or 
oligoribonucleotides, or any modified forms of these polymers 
that are capable of hybridizing with a target nucleic sequence 
by complementary base-pairing. Complementary base pairing 
means sequence-specific base pairing which includes e.g., 
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Watson-Crick base pairing as well as other forms of base 
pairing such as Hoogsteen base pairing. Modified forms 
include 2'-0-methyl oligoribonucleotides and so-called PNAs, 
in which oligodeoxyribonucleotides are linked via peptide 
5 bonds rather than phophodiester bonds. The probes can be 

attached by any linkage to a support (e.g., 3', 5' or via the 
base) . 3' attachment is more usual as this orientation is 
compatible with the preferred chemistry for solid phase 
synthesis of oligonucleotides . 

10 The number of probes in the first probe set (and as 

a consequence the number of probes in additional probe sets) 
depends on the length of the reference sequence, the number of 
nucleotides of interest in the reference sequence and the 
number of interrogation positions per probe. In general, each 

15 nucleotide of interest in the reference sequence requires the 
same interrogation position in the four sets of probes. 
Consider, as an example, a reference sequence of 100 
nucleotides, 50 of which are of interest, and probes each 
having a single interrogation position. In this situation, 

20 the first probe set requires fifty probes, each having one 
interrogation position corresponding to a nucleotide of 
interest in the reference sequence. The second, third and 
fourth probe sets each have a corresponding probe for each 
probe in the first probe set, and so each also contains a 

25 total of fifty probes. The identity of each nucleotide of 

interest in the reference sequence is determined by comparing 
the relative hybridization signals at four probes having 
interrogation positions corresponding to that nucleotide from 
the four probe sets. 

30 In some reference sequences, every nucleotide is of 

interest. In other reference sequences, only certain portions 
in which variants (e.g., mutations or polymorphisms) are 
concentrated are of interest. In other reference sequences, 
only particular mutations or polymorphisms and immediately 

35 adjacent nucleotides are of interest. Usually, the first 

probe set has interrogation positions selected to correspond 
to at least a nucleotide (e.g., representing a point mutation) 
and one immediately adjacent nucleotide. Usually, the probes 
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in the first set have interrogation positions corresponding to 
at least 3, 10, 50, 100, 1000, 20,000, 100,000, 1,000,000, 
10,000,000, or more contiguous nucleotides. The probes 
usually have interrogation positions corresponding to at least 
5 5, 10, 30, 50, 75, 90, 99 or sometimes 100% of the nucleotides 
in a reference sequence. Frequently, the probes in the first 
probe set completely span the reference sequence and overlap 
with one another relative to the reference sequence. For 
example, in one common arrangement each probe in the first 

10 probe set differs from another probe in that set by the 

omission of a 3' base complementary to the reference sequence 
and the acquisition of a 5' base complementary to the 
reference sequence. Figure 3 illustrates an incremental 
succession of probes in a basic tiling strategy. 

15 The number of probes on the chip can be quite large 

(e.g., 10 5 -10 6 ) . However, often only a relatively small 
proportion (i.e., less than about 50%, 25%, 10%, 5% or 1%) of 
the total number of probes of a given length are selected to 
pursue a particular tiling strategy. For example, a complete 

20 set of octomer probes comprises 65,53 6 probes; thus, an array 
of the invention typically has fewer than 32,768 octomer 
probes . A complete array of decamer probes comprises 
1,048,576 probes; thus, an array of the invention typically 
has fewer than about 500,000 decamer probes. Often arrays 

25 have a lower limit of 25, 50 or 100 probes and as many probes 
as 10\ 10 s , 10 6 , 10 7 , 10 B , 10 9 , 10 10 , etc. probes. The arrays 
* can have other components besides the probes such as linkers 
attaching the probes to a support . 

Some advantages of the use of only a proportion of 

30 all possible probes of a given length include: (i) each 

position in the array is highly informative, whether or not 
hybridization occurs; (ii) nonspecific hybridization is 
minimized; (iii) it is straightforward to correlate 
hybridization differences with sequence differences, 

35 particularly with reference to the hybridization pattern of a 
known standard; and (iv) the ability to address each probe 
independently during synthesis, using high resolution 
photolithography, allows the array to be designed and 
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optimized for any sequence. For example the length of any 
probe can be varied independently of the others. 

For conceptual simplicity, the probes in a set are 
usually arranged in order of the sequence in a lane across the 
chip, although this arrangement is not required. For example, 
the probes can be randomly distributed on the chip. A lane 
contains a series of overlapping probes, which represent or 
tile across, the selected reference sequence (see Fig. 3) . 
The components of the four sets of probes are usually laid 
down in four parallel lanes, collectively constituting a row 
in the horizontal direction and a series of 4 -member columns 
in the vertical direction. Corresponding probes from the four 
probe sets (i.e., complementary to the same subsequence of the 
reference sequence) occupy a column. Each probe in a lane 
usually differs from its predecessor in the lane by the 
omission of a base at one end and the inclusion of additional 
base at the other end as shown in Fig. 3. However, this 
orderly progression of probes can be interrupted by the 
inclusion of control probes or omission of probes in certain 
columns of the array. Such columns serve as controls to 
orient the chip, or gauge the background, which can include 
target sequence nonspecif ically bound to the chip. 

The probes sets are usually laid down in lanes such 
that all probes having an interrogation position occupied by 
an A nucleotide form an A- lane, all probes having an 
interrogation position occupied by a C nucleotide form a C- 
' lane, all probes having an interrogation position occupied by 
a G nucleotide form a G-lane, and all probes having an 
interrogation position occupied by a T (or U) form a T lane 
(or a U lane) . Note that in this arrangement there is not a 
unique correspondence between probe sets and lanes. Thus, the 
probe from the first probe set is laid down in the A- lane, C- 
lane, A- lane, A- lane and T-lane for the five columns in 
Fig. 4. The interrogation position on a column of probes 
corresponds to the position in the target sequence whose 
identity is determined from analysis of hybridization to the 
probes in that column. Thus, I x -I 5 respectively correspond to 
Ni-Ns in Fig. 4. The interrogation position can be anywhere in 
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a probe but is usually at or near the central position of the 
probe to maximize differential hybridization signals between a 
perfect match and a single -base mismatch. For example, for an 
11 mer probe, the central position is the sixth nucleotide. 

Although the array of probes is usually laid down in 
rows and columns as described above, such a physical 
arrangement of probes on the chip is not essential. Provided 
that the spatial location of each probe in an array is known, 
the data from the probes can be collected and processed to 
yield the sequence of a target irrespective of the physical 
arrangement of the probes on a chip. In processing the data, 
the hybridization signals from the respective probes can be 
reassorted into any conceptual array desired for subsequent 
data reduction whatever the physical arrangement of probes on 
the chip. 

A range of lengths of probes can be employed in the 
chips. As noted above, a probe may consist exclusively of a 
complementary segments, or may have one or more complementary 
segments juxtaposed by flanking, trailing and/or intervening 
segments. In the latter situation, the total length of 
complementary segment (s) is more important that the length of 
the probe. In functional terms, the complementary segment (s) 
of the first probe sets should be sufficiently long to allow 
the probe to hybridize detectably more strongly to a reference 
sequence compared with a variant of the reference including a 
single base mutation at the nucleotide corresponding to the 
interrogation position of the probe. Similarly, the 
complementary segment (s) in corresponding probes from 
additional probe sets should be sufficiently long to allow a 
probe to hybridize detectably more strongly to a variant of 
the reference sequence having a single nucleotide substitution 
at the interrogation position relative to the reference 
sequence. A probe usually has a single complementary segment 
having a length of at least 3 nucleotides, and more usually at 
least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25 or 30 bases exhibiting perfect 
complementarity (other than possibly at the interrogation 
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position (s) depending on the probe set) to the reference 
sequence . 

In some chips, all probes are the same length. 
Other chips employ different groups of probe sets, in which 
5 case the probes are of the same size within a group, but 

differ between different groups. For example, some chips have 
one group comprising four sets of probes as described above in 
which all the probes are 11 mers, together with a second group 
comprising four sets of probes in which all of the probes are 

10 13 mers. Of course, additional groups of probes can be added. 
Thus, some chips contain, e.g., four groups of probes having 
sizes of 11 mers, 13 mers, 15 mers and 17 mers. Other chips 
have different size probes within the same group of four 
probes. In these chips, the probes in the first set can vary 

15 in length independently of each other. Probes in the other 
sets are usually the same length as the probe occupying the 
same column from the first set. However, occasionally 
different lengths of probes can be included at the same column 
position in the four lanes. The different length probes are 

2 0 included to equalize hybridization signals from probes 
depending on the hybridization stability of the 
oligonucleotide probe at the pH, temperature, and ionic 
conditions of the reaction. 

The length of a probe can be important in 

25 distinguishing between a perfectly matched probe and probes 

showing a single-base mismatch with the target sequence. The 
* discrimination is usually greater for short probes. Shorter 
probes are usually also less susceptible to formation of 
secondary structures. However, the absolute amount of target 

30 sequence bound, and hence the signal, is greater for larger 

probes. The probe length representing the optimum compromise 
between these competing considerations may vary depending on 
inter alia the GC content of a particular region of the target 
DNA sequence, secondary structure, synthesis efficiency and 

35 cross-hybridization. In some regions of the target, depending 
on hybridization conditions, short probes (e.g., 11 mers) may 
provide information that is inaccessible from longer probes 
(e.g., 19 mers) and vice versa. Maximum sequence information 
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can be read by including several groups of different sized 
probes on the chip as noted above. However, for many regions 
of the target sequence , such a strategy provides redundant 
information in that the same sequence is read multiple times 
5 from the different groups of probes. Equivalent information 
can be obtained from a single group of different sized probes 
in which the sizes are selected to maximize readable sequence 
at particular regions of the target sequence. 

Some chips provide an additional probe set 

10 specifically designed for analyzing deletion mutations. The 
additional probe set comprises a probe corresponding to each 
probe in the first probe set as described above. However, a 
probe from the additional probe set differs from the 
corresponding probe in the first probe set in that the 

15 nucleotide occupying the interrogation position is deleted in 
the probe from the additional probe set, as shown in Figure 5. 
Optionally, the probe from the additional probe set bears an 
additional nucleotide at one of its termini relative to the 
corresponding probe from the first probe set (shown in 

20 brackets in Fig. 5) . The probe from the additional probe set 
will hybridize more strongly than the corresponding probe from 
the first probe set to a target sequence having a single base 
deletion at the nucleotide corresponding to the interrogation 
position. Additional probe sets are provided in which not 

25 only the interrogation position, but also an adjacent 
nucleotide is deleted. 

Similarly, other chips provide additional probe sets 
for analyzing insertions. For example, one additional probe 
set has a probe corresponding to each probe in the first probe 

30 set as described above. However, the probe in the additional 
probe set has an extra T nucleotide inserted adjacent to the 
interrogation position. See Fig. 5 (the extra T is shown in a 
square box) . Optionally, the probe has one fewer nucleotide 
at one of its termini relative to the corresponding probe from 

35 the first probe set (shown in brackets) • The probe from the 
additional probe set hybridizes more strongly than the 
corresponding probe from the first probe set to a target 
sequence having an A insertion to the left of nucleotide "n" 
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of the reference sequence in Fig. 5. Similar additional probe 
sets can be constructed having C, G or A nucleotides inserted 
adjacent to the interrogation position. 

Usually, four such additional probe sets, one for 
5 each nucleotide, are used in combination. Comparison of the 
hybridization signal of the probes from the additional probe 
sets with the corresponding probe from the first probe set 
indicates whether the target sequence contains and insertion. 
For example, if a probe from one of the additional probe sets 

10 shows a higher hybridization signal than a corresponding probe 
from the first probe set, it is deduced that the target 
sequence contains an insertion adjacent to the corresponding 
nucleotide (n) in the target sequence. The inserted base in 
the target is the complement of the inserted base in the probe 

15 from the additional probe set showing the highest 

hybridization signal. If the corresponding probe from the 
first probe set shows a higher hybridization signal than the 
corresponding probes from the additional probe sets, then the 
target sequence does not contain an insertion to the left of 

20 corresponding position (("n" in Fig. 5)) in the target 
sequence. 

Other chips provide additional probes (multiple - 
mutation probes) for analyzing target sequences having 
multiple closely spaced mutations. A multiple-mutation probe 

25 is usually identical to a corresponding probe from the first 
set as described above, except in the base occupying the 
interrogation position, and except at one or more additional 
positions, corresponding to nucleotides in which substitution 
may occur in the reference sequence. The one or more 

30 additional positions in the multiple mutation probe are 
occupied by nucleotides complementary to the nucleotides 
occupying corresponding positions in the reference sequence 
when the possible substitutions have occurred. 

Another aspect of the invention derives 

3 5 hybridization patterns from a chip with a first probe set 

comprising a plurality of probes of perfect complementarity to 
the reference sequence, and optionally, one or more additional 
probe sets, each additional set comprising probes 



WO 97/29212 PCT/US97/02102 

33 

corresponding to a probe in the first set with an 
interrogation position for a nucleotide of interest. The 
probes in the additional probe sets differ from their 
corresponding probes in the first probe set by having a 
5 different nucleotide in the interrogation position. The 
overall hybridization is derived by plotting the maximum 
hybridization intensity observed from target hybridization to 
the group of probes consisting of a probe in the first probe 
set and its corresponding probes in the additional probe sets 

10 versus the nucleotide position of the target being 

interrogated by this group of probes. Thus, in this method, 
the probes are grouped according to groups in which all the 
probes in a particular group interrogate a common nucleotide 
position in the sequence being analyzed. These groups are 

15 referred to as groups of interrogatory probes. For example, 
with reference to Fig. 4, the first column of probes with 
interrogation position I 1 is interrogating position n 1 , the 
second column of probes with interrogation position I 2 is 
interrogating position n 2 and so on. In the case described 

20 above in Fig. 1, where the corresponding probe from the first 
probe set has a T nucleotide in the interrogation position and 
the corresponding probes from the other three probe sets have 
a C, G, and A nucleotide respectively, there would be a total 
of four probes interrogating the particular nucleotide of 

25 interest at that position of the sample sequence being 
analyzed. One measures the highest of the intensities 
observed from each of these probes and assigns that measured 
value as being the maximum hybridization intensity 
corresponding to that position of the sample sequence. This 

3 0 determination is then repeated iteratively for the remaining 
nucleotide positions of the sample sequence being analyzed, 
allowing one to obtain a plot of hybridization intensity vs. 
nucleotide position. 

It should be recognized that it is not necessary 

35 that there be additional sets of corresponding probes which 
interrogate all four possible nucleotide polymorphisms at a 
particular position. The method described immediately above 
measures both the maximum hybridization intensity at a 
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particular position and also how that maximum intensity 
changes from that position to the next adjacent position as 
one scans or "tiles through" the sequence of the target. 
Therefore, the chip can use a single probe set complementary 
5 to the reference sequence; multiple probe sets each of which 

interrogate a particular position in the target by varying the 
corresponding nucleotide of interest at that position; or even 
additional probe sets which are of different lengths to the 
first and additional probe sets comprising the first set of 

10 groups of interrogatory probes. 

For example, one can use a chip with a single probe 
set complementary to the reference that tiles across the 
reference sequence. In this case, despite there being only 
one probe which interrogates a particular nucleotide position 

15 of the sample, one plots the hybridization intensity of that 
particular probe as a function of the nucleotide position 
being interrogated. In this case, there is no "maximum" 
hybridization intensity because each position is being 
"interrogated" by just one probe. However, one can still 

20 derive from the image plots of hybridization intensity as a 

function of nucleotide position of the sequence being analyzed 
and build up databases which correlate genotype with the 
derived plot. This method, using one set of probes 
(complementary to the reference sequence) is described in U.S. 

25 patent application Serial No. 08/531,137, filed October 16, 

1995. A plot obtained from such a method is shown in Fig. 17. 
This entire plot is derived from the image gathered from a 
single hybridization experiment. 

As the sample sequence being analyzed varies, the 

3 0 shape of this plot will also vary. For the purposes of 

. clarity of explanation the following discussion uses a chip 
with probes complementary to a reference sequence from the 
Mycobacterium tuberculosis rpoB gene and the plot of maximum 
hybridization intensity derived from the image observed when 

35 the reference sequence (i.e., from an M. tuberculosis sample) 
is hybridized to the chip is called the baseline or reference 
plot (or pattern) . Target sequences from species of 
Mycobacterium other than tuberculosis will give plots of 
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different shape. Hybridization experiments with targets of 
known speciation thus provides a database in which each of 
these differently shaped plots is correlated with speciation 
or other genotypic feature, which then in turns allows one to 
predict the presence of a phenotype. It should be apparent 
that any gene of interest can be tiled across the chip and 
that the hybridization pattern derived from the image on the 
chip from any other sample suspected of containing that gene 
or polymorphic variant thereof can be used to detect the 
presence or absence of a particular variant of that gene in 
the sample. 

Fig. 17 shows the plot of hybridization intensity as 
a function of position being interrogated along the reference 
sequence for the case where the reference sequence is from the 
Mycobacterium tuberculosis rpoB gene (the Mtb chip described 
in the Examples) and the target is M. gordonae. Fig. 18 shows 
similar plots obtained with other Mycobacterium species. It 
is also noteworthy that species different to M* tuberculosis 
produce differences in hybridization intensity even from 
segments of the sequence which are identical to M. 
tuberculosis. Thus this method allows one to derive 
information even from subsequences that are identical to M. 
tuberculosis . As will be apparent, each species produces a 
characteristic pattern. One can pick the pattern obtained 
with the reference sequence, in this case, the Mt rpoB, as 
being the baseline (or reference) pattern and overlay the 
pattern from the target over the pattern from the reference to 
detect differences from the reference. Fig. 19 shows such an 
overlay of patterns from different targets versus Mtb as 
observed on an Mtb chip. As one expects, when the target is 
M. tuberculosis (bottom panel) the overlay is perfect, whereas 
when the target is not M. tuberculosis differences are 
present. Each of these patterns is thus a "fingerprint" for 
that particular species. Once a database of fingerprints is 
established for one can compare the corresponding pattern 
obtained from an unknown target to either conclusively 
identify the target as being a particular species, or exclude 
the possibility of that target being any one of the species 
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represented in the data base. Figs, 20A-20D show such a 
comparison of an unknown patient sample to fingerprints from 
four Mycobacterium species showing a match and thus 
identifying the unknown as being AT. gordonae. Figure 21 shows 
5 a similar identification of two other samples (6 and 7, 
previously incorrectly identified as M. avium by another 
technique) as M. xenopi and M. intracellulars. 

As the above discussion indicates, there are several 
ways of plotting hybridization intensity versus nucleotide 

10 position of the sample, all of which provide patterns which 
are characteristic of a genotypic difference. As such, this 
invention is not limited using plots derived by the specific 
protocols disclosed herein. As explained earlier, 
identification of genotype and genotypic differences also 

15 allow the prediction of a phenotype. 

It should also be recognized that the sequence used 
to generate the baseline pattern against which the target 
pattern is compared need not be that derived from a sample in 
which the reference sequence was hybridized to the chip. Any 

20 other sample that is related to the target sample may be used 
since the method compares differences between the baseline 
pattern and the pattern from the unknown target. 

High density oligonucleotide arrays may be utilized 
to detect drug resistance conferring mutations using 

25 information gathered form gene regions utilized to identify 
species of isolates within the genus Mycobacterium. For 
example, the 700 base pairs of the rpoB gene of Mycobacterium 
may be utilized to detect mutations that confer resistance to 
rifampin and to detect polymorphisms which allow for the 

30 identification of Mycobacterium species. 

Table IB indicates the total polymorphic variation 
observed among the nine non-tuberculosis species compared to 
M. tuberculosis within the 700 base pairs of rpoB. With any 
of these non-tuberculosis species there are both species 

35 specific (base positions where observed polymorphisms found 
only with that species) and share (base positions which have 
polymorphisms found in some of the isolates of that species 
and some isolates of other Mycobacterium species) . Virtually 
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all of these polymorphisms have never been previously 
described and constitute useful and important markers for the 
identification of their corresponding species. 



Table IB 

Mycobacteria Polymorphic Analysis 



Species 


Total Polymorphisms 


Species Specific 


Shared 


M. avium 


72 


3 


69 


M. chelonae 


106 


8 


48 


M. fortuitum 


103 


21 


81 


M. gordonae 


102 


26 


76 


M. intracellular 


59 


3 


56 


M. kansasii 


84 


12 


72 


M. scrofulaceum 


62 


2 


60 


M. smegmatis 


101 


10 


91 


M. xcnopii 


73 


13 


60 



16SrRNA sequences are commonly utilized to identify 
species of Mycobacterium, However, analysis of the 
hybridization pattern using the rpoB gene indicates that there 
are some isolates that have been misclassif ied. For example, 
two Mycobacterium isolates received from the California Public 
Health department, 96-1761 and 95-1760, were indicated as M. 
avium isolates. When the rpoB gene is utilized, it was 
determined that the most similar match was with Af. 
intracellulars (a close relative of M. avium) . 

The following will describe an embodiment that 
identifies the species within a genus to which an organism 
belongs. However, the process is generally applicable to 
assigning groups to organisms, where the groups may be 
species, subspecies, phenotypes, genotypes, and the like. 
Accordingly, the description that follows illustrates one 
embodiment of the invention. 

Fig. 35 shows a computer- implemented flowchart of a 
method of identifying a species within a genus to which an 
organism belongs. At step 300, species of nucleic acid 
sequences from known organisms are input. These nucleic acid 
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sequences will be called "known nucleic acid sequences." 
Additionally, at step 302, hybridization patterns for the 
known nucleic acid sequences. The hybridization patterns 
indicate the hybridization affinity of subsequences of the 
5 known nucleic acid sequences to subsequences of a reference 
nucleic acid sequence. For example, the subsequences of the 
reference nucleic acid sequence may be portions of nucleic 
acid probes on a chip. 

At step 304, a database of the species and 

10 hybridization patterns of the known nucleic acid sequences may 
be generated. As with other steps, this step is optional but 
may make identifying species more efficient. 

The system compares the hybridization pattern of a 
sample nucleic acid sequence to the hybridization patterns for 

15 the known nucleic acid sequences at step 308, which may be 
optionally stored in a database. At step 308, the system 
determines the species of the organism from which the sample 
nucleic acid was obtained according to the hybridization 
pattern of the known nucleic acid sequences that most closely 

20 matches the hybridization pattern of the sample nucleic acid 
at specific locations. Although an overall pattern matching 
technique may be utilized, one may also analyze species 
specific polymorphic locations and/or shared polymorphic 
locations. Additionally, it may be a combination of 

25 hybridization patterns that are utilized to identify the 
species of the sample nucleic acid sequence. 

Comparing the hybridization patterns may be done in 
any number of known techniques. In a preferred embodiment, 
linear regression is utilized across all or selected base 

30 positions to normalize the hybridization intensities. A 
regression coefficient from the linear regression is then 
utilized to measure the closeness of the hybridization 
intensities of the hybridization patterns and therefore, the 
nucleic acid sequences. Additionally, depending on how 

35 closely the hybridization pattern of the sample nucleic acid 
matches a hybridization pattern of the known nucleic acid 
sequences, the system may calculate a probability that the 
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identified species for the sample is correct as indicated at 
step 310. 

Again referring to Fig, 19 , the figure shows plots 
of hybridization intensities of Mycobacteria species. A DNA 
5 assay was designed for Mycobacteria tuberculosis (Mtb) as the 
chip wild- type sequence. This chip will be referred to as the 
Mtb chip to indicate that the chip was tiled for Mtb. In 
other words, in addition to other probes, there are probes 
that are perfectly complementary to Mtb at sequential base 

10 locations. These probes will be referred to as the wild-type 
probes or probes complementary to the reference sequence. 

Mtb was hybridized to the Mtb chip and the 
hybridization intensities of the wild-type probes (here 
measured as a logarithmic function of the photon counts) vs. 

15 the base position is shown in the bottom graph identified as 
"Mtb vs. M. tuberculosis." The graph illustrates an example 
of a hybridization pattern for Mtb. As indicated by the title 
of the graph, the graph actually shows the Mtb hybridization 
intensity pattern vs. itself so that there are actually two 

20 hybridization patterns superimposed on each other. In the 

following paragraphs, the Mtb sequence will be identified as 
the reference sequence (i.e., is typically a known sequence). 

There are many species of Mycobacteria. Numerous 
species were hybridized to the Mtb chip and the graphs in Fig. 

25 19 show the hybridization intensities of the wild-type probes 
{again measured as a logarithmic function of the photon 
* counts) vs. the base position. In addition to the 
hybridization pattern for the Mycobacteria species, each graph 
also shows the hybridization pattern for the reference 

30 sequence, Mtb. 

Although 80% of the bases of the different 
Mycobacteria species may be the same, each species generates a 
unique hybridization pattern or footprint. A sample sequence 
which is known to be a Mycobacteria species (e.g., from 

35 previous base calling algorithms or dideoxy sequencing) may be 
similarly hybridized to the Mtb chip. The hybridization 
pattern that results may be compared to the hybridization 
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patterns of known sequences to determine the identity of the 
sample sequence. 

The Mycobacteria species themselves (or other 
species) may have enough similarities that the base calling 
5 algorithm is able to identify the sample as a Mycobacteria 
species- The species may also have enough differences that 
this method is able to identify the species according to the 
hybridization pattern. 

Although in this example, the chip-wild type 

10 sequence and the reference sequence were the same sequence, 
different sequences may be utilized. The hybridization 
patterns discussed were generated by the hybridization 
intensities of the wild-type probes. However, hybridization 
patterns may be generated other ways including utilizing 

15 hybridization intensities of the highest intensity probe at 

each base position. Additionally, the method may be utilized 
on other species or even unrelated nucleic acid sequences to 
identify a sample sequence. 

Typically, the hybridization differences observed 

20 between different species are large, whereas, as expected, the 
differences between different isolates of the same species are 
smaller. Therefore, one can set the cut off of the 
discriminating pattern matching function to whatever 
predetermined level is desired, depending on whether one is 

25 attempting to assign speciation or track an isolate. Figure 
22 shows the patterns observed with different isolates of M. 
gordonae and their comparison to a single isolate of M. 
gordonae (ATCC isolate) . 

It should be noted that derivation of hybridization 

30 intensity vs. nucleotide position patterns and their 

correlation with patterns of known identity does not require 
that one identify the base present at particular position of 
the target or sequence the target. Instead, one determines 
the maximum hybridization intensity observed from any of the 

35 one or more probes which are interrogating for the presence of 
nucleotide identical to that of the reference sequence at the 
corresponding position in the target and plots how this 
changes as a function of base position. The pattern thus 
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obtained is compared to a database of patterns from organisms 
of known speciation to establish the presence or absence of 
match. Thus, there is no necessity to "call" or identify any 
of the bases in the target sequence in order to make an 
assignment . 

Once differences between the target sequence pattern 
and the baseline reference pattern are established, these 
differences can be used in the same manner as the presence of 
differences in nucleotide sequence between target and 
reference to derive probabilities that the presence of a 
certain level of difference in hybridization intensity at a 
particular position indicate a certain species or genotype. 
All the observed differences from the reference can then be 
combined to give a composite probability of the sample being 
of a particular species or genotype. Thus, derivation of 
these patterns of hybridization also allows the use of the 
"bar code" type of identification method described earlier. 
As Fig. 17 shows, using patterns derived from hybridization 
intensity allows one to obtain information from the entire 
sequence, not just the regions where the sequence of the 
target differs from that of the reference. 

One advantage of this method of pattern matching is 
that provided the same set of probes is used on a chip, one 
can use different chips at different times and with different 
concentrations of sample to make the assignment because each 
different species will produce the same and invariant 
hybridization pattern. For example, one does not need to 
derive a control pattern from the reference sequence 
simultaneous with the analysis of the target to comparing the 
two patterns (target vs. reference sequence) , since the 
control pattern is invariant and the pattern matching looks at 
the relative changes in maximum hybridization intensities 
between succeeding base positions along the sequence. Thus, 
factors such as amplification conditions and sample 
concentration which would affect hybridization at all sites 
equally can be normalized during the analysis. 

One will recognize that this method of using 
oligonucleotide arrays with such pattern matching techniques 
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is generally applicable to reference sequences other than the 
rpoB gene and can be used to detect any differences between a 
target and reference sequence from any gene- By way of 
example, and not limitation, the reference sequence can be 
5 from a gene coding for an HIV gene, breast cancer (BRCA-1) or 
for cystic fibrosis. Software is used to plot the 
hybridization intensities and compare the pattern so derived 
to a pattern from the reference or other sequence to establish 
differences between the target and reference, identity or lack 

10 thereof of target to sequence in the database of patterns. 

Polynucleotide sequence can be represented as an 
assembly of overlapping oligonucleotides. Therefore, an array 
consisting of the set of complementary oligonucleotides to a 
specific sequence can be used to determine the identity of a 

15 target sequence, quantitate the amount of the target, or 
detect differences between the target sequence and a 
reference. Many different arrays can be designed for these 
purposes. One such design, termed a tiled array, is depicted 
schematically in Fig. 23A. 

20 The use -of a tiled array of probes to read a target 

sequence is illustrated in Fig. 23B. A p 15 - 7 (15-mers varied at 
position 7 from the 3' end) tiled array was designed and 
synthesized against MT1, a cloned sequence containing 1,311 
bases spanning the D-loop, or control region of the human 

25 mitochondrial DNA. The upper image panel of Fig. 23B shows a 
portion of the fluorescence image of mt 1 fluorescein labelled 
RNA hybridized to the array. The base sequence can be read by 
comparing the intensities of the four probes within each 
column. For example, the column labelled 16,493 consists of 

30 the four probes, 3 # TGACATAGGCTGTAG (SEQ ID NO:l), 3' 

TGACATCGGCTGTAG (SEQ ID NO: 2), 3' TGACATGGGCTGTAG (SEQ ID 
NO: 3), 3' TGACATAGGCTGTAG (SEQ ID N0:4) . The probe with the 
strongest signal is the probe with the £ substitution (A 301, 
£ 57, G 135, T 110 counts), identifying the base at position 

35 16,493 as a U (complementary to the A probe) in the RNA 

transcript. Continuing the process, the rest of the sequence 
can be read directly from the hybridization intensities. 
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The detection of a single base polymorphism is shown 
in the lower image panel of Fig. 23B. The target hybridized 
in the lower panel is MT2 , which differs from MT1 in this 
region by a T to C transition at position 16,493. 
5 Accordingly, the probe with the G substitution shows the 

strongest signal. Since the tiled array was designed to MT1, 
neighboring probes which overlap 16,493 are also affected by 
the change. Because 15-mer probes are used, a total of 15 
columns, or 60 probes, are affected by a single base change in 

10 the target. In the p 15,7 array, probes in the 8 positions to 
the left and 6 positions to the right of the probe set 
interrogating the mutation have an additional mismatch to the 
target. The result is a characteristic "footprint", or loss 
of signal in the probes flanking a mutation position, 

15 reminiscent of the U shaped curve of Fig. 24. (The data shown 
in Fig. 24 are for 8 mer probes, but we have been able to 
discriminate single base end position mismatches from perfect 
matches even using 20 mers) . Of the four interrogation probes 
at each position, signal loss is greatest from the probe 

2 0 designed to have zero mismatches to MT1 . We identify the set 
of these designed probes as p 015 ' 7 or simply P°. In the other 
three probe sets, designated p 1 , the MT1 signal is already low 
as a result of the single base mismatch at the interrogation 
position. 

25 

Comparative Hybridization and Multi-Color Detection 

Patterns of signal intensities and their differences 
resulting from mismatches, such as the example shown in Fig. 
23B, can be used to advantage in sequence analysis. The loss 

30 of hybridization signal from P° is a powerful indicator of 
sequence difference between reference and target. This 
information is best obtained by hybridizing both the reference 
and the target sequence simultaneously to the same array. In 
order to extract, the maximum amount of useful information from 

35 a simple tiled array, we developed a two-color labelling and 
detection scheme, allowing us to use as an internal standard 
the hybridization of a reference sample of known sequence 
(Fig. 25) . The reference is labelled with phycoerythrin (red) 
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and the unknown target with fluorescein (green) . This 
approach minimizes or eliminates experimental variability 
during the fragmentation, hybridization, washing, and 
detection steps, A further advantage is that the sample and 
5 reference targets are in competition, enhancing mismatch 
discrimination. 

It is also possible to perform the experiment by 
hybridizing the reference and unknown to two different chips 
in parallel under identical conditions. In this case, only a 

10 single label is required. Using either approach, differences 
between two related sequences can be identified from a 
straightforward comparison of the scaled hybridization 
patterns of the p° probes. Differences in p° intensities 
resulting from a polymorphism extend over a number of 

15 positions and correlate with probe length and substitution 

position (Fig. 25) . This characteristic large-scale pattern 
is more robust and easier to recognize than an intensity 
difference at a single position. Since the amplitude of each 
p° signal is sequence and mismatch dependent, the actual size 

20 and shape of footprints is variable. Thus, sequences can be 
identified by directly comparing hybridization amplitude 
signatures, rather than by comparing analyzed sequences, which 
may contain embedded errors of interpretation. Hybridization 
pattern analysis may provide advantages over other methods of 

25 detecting sequence variation, or in some cases may be useful 
in conjunction with them. 

Polymorphism Screening of the Cytochrome b Gene and Control 
Region 

30 We have shown how a p° probe set in conjunction with 

a reference hybridization pattern can be used to analyze 
sequences over much larger spans than 600 bp, and how single 
base polymorphisms can be read using a tiled array. We 
combined the two approaches and used array hybridization 

35 patterns to perform automated basecalling from complex target 
sequences. The combination was useful in overcoming most 
difficulties. For example, some sequences can cross -hybridize 
significantly, particularly when the target is long and 
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repetitive on a fine scale, or GC rich, even locally. By 
analyzing targets in terms of differences from a reference of 
known sequence, many potentially confusing signals could be 
disregarded because they were the same in both samples. A 
5 second limitation is that if two or more polymorphisms occur 
within a single probe span, the resulting destabilization 
tends to reduce the accuracy of sequence interpretation, 
although the existence of a change can easily be inferred from 
the loss of signal (Fig. 25). We adopted an approach that 
10 simply identifies such regions for further analysis, rather 
than attempting to read them directly. 

After applying an automated basecalling algorithm, 
which uses all four interrogation probes for each position and 
compares the reference and sample hybridization intensities 
15 the derived sequences were separated into two categories : one 
that could be read directly with high accuracy, and a second 
that required further analysis for definitive sequence 
assignment. The first category was defined as having a 
derived sequence with no more than a single mismatch with each 
20 p° probe, and agreement between the derived sequence and p° 
footprint patterns. 

The p° intensity footprints were detected in the 
following way: the reference and sample intensities were 
normalized, and R, the average of log 10 <p° reference/p° 
25 sample) over a window of 5 positions, centered at the base of 
interest, calculated for each position in the sequence. To 
' normalize the sample probe intensities to the reference 
intensities, a histogram of the base 10 logarithm of the 
intensity ratios for each pair of probes was constructed. The 
30 histogram has a mesh size of 0.01, and was smoothed by 

replacing the value at each point with the average number of 
counts over a five-point window centered at that point. The 
highest value in the histogram was located, and the resulting 
intensity ratio was taken to be the most probable calibration 
35 coefficient. Footprints were detected as regions having at 
least 5 contiguous positions with a reference or sample 
intensity of at least 50 counts above background, and an R 
value in the top 10th percentile for the experiment. At 205 
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polymorphic sites, where the sample was mismatched to p°, the 
mean R value was 1.01, with a standard deviation of 0.57. At 
35,333 non polymorphic sites (i.e., where both reference and 
sample had a perfect match to p°) the mean R value was -0,05, 
with a standard deviation of 0.25. 

The second category had a derived sequence with 
multiple mismatches and/or disagreement between derived 
sequence and p° patterns. For example, the region of ief007 
shown in Fig. 25A would fall into the first category, if the 
sequence were called correctly. A false positive basecall 
would lack a footprint or would result in the prediction of 
multiple mismatched probes, and be flagged in either case. 
False negatives are detected by the presence of a footprint 
despite a "wild-type" basecall. 

An example of a false negative basecall resulting 
from the use of a limited probe set is shown in Fig. 2 6A. In 
this case, there is a (CA) n length polymorphism, where n = 4 in 
MT2 and n = 5 in the reference, MT1 . The array is designed to 
read <CA) 5 , but the MT2 target hybridizes sufficiently well to 
be read as "wild type". However, a footprint is detected, and 
therefore, the region is flagged for further analysis, some 
differences in hybridization patterns are secondary effects of 
a change elsewhere in the target. An example that is likely 
due to a sequence-specific difference in target secondary 
structure is shown in Fig. 26A. This example shows that the 
interpretation of difference patterns is not always 
straightforward. In general, however, the difference patterns 
provide a substantial amount of additional information that 
aids sequence analysis., 

We analyzed a 2.5 kb region of MT DNA spanning the 
tRNA GluThr , cytochrome b, tRNA 1 ^, tRNA Pro , control region and 
tRNA Phe DNA sequences. These sequences have very different 
functions ranging from protein coding to tRNA structure to 
regulatory, and should therefore provide a good comparative 
basis for evaluating the different mutation rates of MT DNA. 
The p 20 ' 9 tiling array was used to analyze a total of 12 samples 
containing 180 (0.59%) base substitutions relative to MT1. 
The results are presented in Table 1A. Of the 3 0,582 bp 
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analyzed, base substitutions were read in 98% of the sequence 
with > 99% accuracy without user intervention. No false 
positive calls were made. The remaining 2% of sequence was 
flagged for further analysis. This indicates a very high rate 
of accuracy, which was obtained for the analysis of 2.5 kb of 
sequence at one time. Thus, while the more mature gel -based 
technology was able to read clustered and length 
polymorphisms, hybridization to a 4N tiled array yielded 
comparable results over most of the sequence with considerably 
less effort. 



Table 1A. Sequence analysis results. 



Polymorphic Sites* 




Non-polymorphic Sites 








TOTAL 




Called* 




TOTAL 




Called 




Mismatches' 


0 




0 




0 


2. 


0 


> 


AM Positions 


134 


46 


130 


35 


26883 


3465 


26883 


3457 


Untagged Regions 


126 


1 


126 


0 


26732 


3020 


26732 


3020 


Flagged Regions 


8 


45 


4 


35 


151 


445 


151 


437 



Sequence differences relative to the MT1 reference sample. A common length polymorphism at position 310 was not detected 
under the conditions used and was excluded from this analysis. However, this polymorphic she has previously been shown to 
be amenable to screening by oligonucleotide hybridization. 



b. Number of sequence positions called correctly by automated basecaller. 

c. The p* probe for each target base b either perfectly matched to the target (0) or has 1 1 mismatches as a result of neighboring 
polymorphisms. 

A total of 12 samples containing 180 substitutions 
relative to mtl was analyzed (mt3 , mt4 / mt5, mt6, haOOl, 
ha002 / ha004, ha007, ief002, ief007, iefOll and yr019) . 
Results are summed for all 12 samples. All but two of the 180 
substitutions were detected as po intensity differences (one 
of the exceptions was read correctly by basecalling and 
automatically flagged for further analysis) . In total, only 
64 9 bp (2% of the sequence analyzed) were flagged for further 
analysis. Basecalling results are broken down separately for 
unf lagged and flagged regions. Fully automated basecalling in 
unf lagged regions had an error rate of 1/127 polymorphisms or 
1/29,879 total bp. In contrast, flagged regions, which 
included 53 (29%) of the substitutions contained 14 false 
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negatives and 8 false positive. However, we estimated that, 
on average, 2 to 3 conventional sequencing reactions per 
sample, or - 30 reactions in total, would resolve the flagged 
regions, to give a basecalling accuracy in excess of 99.9% for 
5 the entire sequence. This represents 8-fold less sequencing 
than we used to determine the sequences by conventional 
methods alone and a similar saving in the labor intensive task 
of sequence checking and editing. In this experiment, samples 
were prepared and hybridized as described in Fig. 25. In 

10 order to provide an independently determined reference 

sequence, each 2.5 kb PCR amplicon was sequenced on both 
strands by primer-directed fluorescent chain- terminator cycle 
sequencing using an ABI373A DNA sequencer, and assembled and 
manually edited using Sequencher 3.0. Hybridization analysis 

15 was also performed on both strands. The analysis presented 
here assumes that the sequence amplified from genomic DNA is 
essentially clonal, or at least contains one predominant 
species, and that its determination by gel -based methods is 
correct. PCR amplification errors might contribute a maximum 

20 of - 0.5% sequence difference, essentially randomly 

distributed, based on an estimation of - 10 5 fold amplification 
and known error rates of Taq polymerase. This would not be 
expected to affect significantly the results of gel based 
sequencing or hybridization analysis, particularly when 

25 analyzing differences from a reference hybridization pattern. 

Basecalling was performed using a Bayesian 
classification algorithm based on variable kernel density 
estimation. The likelihood of each basecall associated with a 
set of hybridization intensity values was computed by 

30 comparing an unknown set of probes to a set of example cases 
for which the correct basecall was known. The resulting four 
likelihoods were then normalized so that they summed to 1. 
Data from both strands were combined by averaging the values. 
If the most likely basecall had an average normalized 

35 likelihood of greater than 0.6, it was called, otherwise the 
base was called an ambiguity. The example set was derived 
from 2 different samples, ib013 and iefOOS, which have a total 
of 35 substitutions relative to MT1 of which 19 are shared 



WO 97/29212 PCT/US97/02102 

49 

with the 12 samples analyzed and 16 are not. Base calling 
performance was not sensitive to the choice of samples. The 
hybridization sequence analysis was fully automated, with no 
user editing. In contrast, conventional sequencing required 
5 contig assembly followed by editing, in which > 1% of 
basecalls were manually corrected. 

High Density Oligonucleotide Arrays 

Several technologies have been developed to design, 

10 synthesize, hybridize and interpret high density 

oligonucleotide arrays of the type described above. 
Representative arrays are described in described in U.S. 
Patent No. 5,143,854 and PCT patent publication Nos. WO 
90/15070 and WO 92/10092, each of which is incorporated herein 

15 by reference. Often, arrays have a lower limit of 25, 50 or 

100 probes and an upper limit of 1,000,000, 100,000, 10,000 or 
1,000 probes. A range of lengths of probes may be employed. 
Preferably, each of the high density oligonucleotide arrays 
contain 10,000-20,000 oligonucleotide probes (length 10-20 

20 mers) which are used to determine the sequence of a target 
nucleic acid (RNA or DNA) . Frequently, the density of the 
different oligonucleotides is greater than about 6 0 different 
oligonucleotides per 1 cm 2 . The determination of a target 
sequence is accomplished by carrying out a single 

25 hybridization reaction involving all of the probes on the 
surface of the chip. The following is a brief overview 
describing the synthesis, array design, sample 
preparation/fluorescent labeling and base calling features of 
the DNA chips used in this invention. 

30 

Light directed synthesis: 

DNA chips use light directed synthesis to build 
oligonucleotide probes on the surface of the chip (Fodor, et 
al., Science , 251:767-73 (1991)). This light -directed 
35 synthesis combines semiconductor based photolithography and 

solid phase chemical synthesis. With reference to Fig. 6, the 
process begins when linkers modified with photochemically 
removable protecting groups (C) are attached to a solid 
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substrate, the chip surface. Linkers and phosphoramidites 
with photolabile protecting groups have been synthesized and 
are described by Pease, et al., PNAS , 91:11241-11245 (1994). 
Light is directed through a photolithographic mask to specific 
5 areas of the synthesis surface, activating those areas for 
subsequent chemical coupling. The first of a series of 
nucleotides (T in Fig. 6) possessing photolabile protecting 
groups, is incubated with the chip and chemical coupling 
occurs at those sites which have been illuminated in the 
10 preceding step. Light is then directed through a different 
section of the mask to the next synthesis site and the 
chemical steps, a defined collection of oligonucleotide probes 
can be constructed, each having its own unique address on the 
surface of the chip. 

15 

Synthesis of complete and subset -combinatorial arrays: 

In a light -directed synthesis, the location and 
composition of the oligonucleotide products depends upon the 
pattern of illumination and the order of chemical coupling 

20 reagents. Consider the synthesis of a chip containing all 

possible tetranucleotide oligomers (256 possibilities) (Fig. 
7) . In cycle 1, mask 1 activates one fourth of the substrate 
surface (dT) . In round 2 of cycle 1, mask 2 activates a 
different quarter of the substrate for coupling with the 

25 second nucleoside (dC) . This process is continued to build 
four regions of mononucleotides. The masks of cycle 2 are 
perpendicular to those of cycle 1, and each synthesis round 
generates four new dinucleotides until all 16 possible 
dinucleotides are made (Fig. 7) . The masks of cycle 3 further 

30 subdivide the synthesis regions so that each coupling round 
generates 16 trimers. The subdivision of the substrate is 
continued through cycle 4 to form all possible 256 tetramers 
(complete combinatorial array) . The successful demonstration 
of light -directed complete combinatorial array has recently 

35 been described (Pease, et al . , PNAS , 91:11241-11245 (1994)). 
It is important to note that any subset of a complete array 
can be synthesized by modifying the mask patterns used in each 
cycle and round of synthesis . The complete combinatorial 
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arrays can be used for applications in which de novo 
sequencing is sought (Fodor, et al., 1993), while a subset of 
combinatorial arrays can be used for resequencing applications 
such as will be employed in this application. 

Sample preparation, and fluorescent labeling of target nucleic 
acid: 

Oligonucleotide arrays hybridized to amplification- 
generated fluorescent ly- labeled DNA or RNA and the 
hybridizations are detected by epi- fluorescence confocal 
microscopy (Fodor, et al., 1993; Molecular Dynamics, Santa 
Clara, CA) . This process is initiated by the extraction of 
target nucleic acids from the sample. With reference to Fig. 
8, the target nucleic acid (mycobacterium genomic DNA) is 
amplified by the polymerase chain reaction (PCR) using target 
gene specific primer pairs containing bacteriophage RNA 
polymerase promoter sequences (Fig. 8) . PCR amplified copies 
of the target nucleic acid are converted from double stranded 
(ds) DNA into fluorescently- labeled single stranded (ss) RNA 
during an in vitro transcription (IVT) reaction. Finally, the 
fluorescein- labeled target gene specific RNA transcripts are 
fragmented into oligomer length targets under elevated 
temperature and 3 0mM Mg**. The precise protocols, primers and 
conditions for sample extraction, amplification, chip 
hybridization and analysis are described in the Examples, Of 
course other labelling strategies may be utilized. 

Resequencing chips and detection of single base mismatches: 

As described earlier, a target gene sequence can be 
represented on a chip in a series of overlapping 
oligonucleotide probes arrayed in a tiling strategy (Fig. 9) . 
In such a strategy each base in the target is interrogated by 
using a collection of 4 oligonucleotide probes which are 
identical except for the base located at or near the center of 
the probe. Each of the four probes contains dA, dT, dC or dG 
at this interrogation position. Of the four oligonucleotide 
probes the one which is the exact complement will produce the 
most stable hybrid and thus the strongest fluorescent signal 
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after post -hybridization washing the DNA chip. Likewise, the 
next nucleotide in the target sequence can be interrogated 
with four identical length probes which are the same as the 
first four except they are offset one nucleotide downstream. 
5 The central base of these probes also have all four possible 

bases. In like fashion, all of the bases of a target sequence 
can be interrogated using overlapping probes arranged in a 
tiling strategy. The determination of which of the four 
possible probes is the complement to target is made by taking 

10 the ratio of highest to next highest hybridization signal. If 
this ratio is greater than 1.2, then a specific determination 
of the interrogated base can be made. If the highest 
hybridization signal does not meet this criteria then an 
ambiguous determination is made based on the IUPAC sequence 

15 codes. 

The sensitivity of DNA chip probes to detect single 
base mismatches is illustrated using a 16 step combinatorial 
synthesis. The photolabile MenPoc-dA and MenPoc-dT were the 
only nucleotides used during the synthesis of the probes on 

20 the chip. The lithographic masks were chosen such that each 
of 256 octanucleotides were synthesized in four independent 
locations on a 1.28 x 1.28 cm chip surface. This yielded an 
array of 1024 octanucleotides each occupying a 400 x 400 jzm 
synthesis region. Following synthesis and phenoxyacetyl 

25 deprotection of the dA amine, the glass substrate was mounted 
in a thermostatically regulated hybridization cell. The 
target employed for this experiment was an oligonucleotide 5'- 
AAAAAAAA- fluorescein- 3 ' present in a 1 nM concentration. 
After 30 minute hybridization and washes with x 1.0 SSPE at 15 

30 °C, the chip was scanned using an epi- fluorescent confocal 
reader. The fluorescent intensities of each of the 
hybridization events were plotted against the position of the 
mismatches of the probes on the surface of the chip (Fig. 10) . 
The position zero mismatch (with the perfect complement 3'- 

35 TTTTTTTT-5 ' ) is the brightest hybridization on the array with 
the background signal of this array at approximately 220 
counts. Mismatch position 1 (at the 3' end of the probe) (3'- 
ATTTTTTTT- 5 ' ) is the next highest hybridization. The 
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resulting n U" shaped curve indicates the relative stability of 
the mismatches at each position of the probe/ target complex. 
The mismatches at positions 3, 4, 5 and 6 are very 
destabilizing and the intensities of these hybridizations are 
5 approximately 3 fold lower than the perfect match 

hybridization. It is also noteworthy that the mismatch at 
position 1 (the point where the octanucleotide is tethered to 
the chip substrate) is less destabilizing than the 
corresponding mismatch at position 8 (5' free end) . The 
10 uniformity of the array synthesis and the target hybridization 
is reflected in the low variance of the intensities of the 
four duplicate synthesis sites. 

Pattern Recognition Algorithm 

15 Hybridization patterns derived from the 

oligonucleotide probe arrays can be correlated with the drug 
resistance phenotype and speciation of the organism using 
mathematical pattern recognition algorithms, such as tree- 
structured classification techniques. It is important to note 

20 that as the total number of analyzed isolates for each species 
is increased, it is unlikely that a single and unique core 
fingerprint will define a mycobacterium species. Rather, it 
is expected that any particular isolate of a Mycobacterium 
species will have a subset of all possible fingerprints. 

25 Identification of the Mycobacterium species based on a 

fingerprint pattern will require a classification analysis 
built upon a collected database consisting of species specific 
and shared SNPs and fingerprints. 

The goal of identifying an unknown rpoB 

3 0 hybridization pattern as coming from one of the Mycobacterium 
species in the data base is a general classification problem. 
Measurements (sequence and fingerprint data) are made on a 
collection of samples. Based on these measurements a 
systematic way is developed to predict the class (species) of 

35 each member of the collection. The signal produced by the 
target at each hybridization site is compared to the signal 
produced by MT rpoB. Based on this comparison one determines 
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whether or not there is a difference in genotype at the 
interrogated at that site in the target relative to MT rpoB. 

Classifier construction is based on past experience. 
In systematic classifier construction, past experience is 
summarized by a learning sample (a.k.a. design or training 
sample) . This consists of the measurement data on N cases 
observed in the past together with their actual 
classification. It is intended that the database collected in 
Phase I will serve as the initial training sample. There are 
two general types of variables that can appear in the 
measurement data, ordered or numerical variables and 
categorical variables. A variable is called ordered or 
numerical if its measured values are real numbers. A variable 
is categorical if it takes values in a finite set not having 
any natural ordering. In our case, each nucleotide position 
in the sequence is a variable. So all measurement variables 
are categorical. The set of measurement variable for a given 
case is called the measurement vector. The measurement space 
is defined as the set of all possible measurement vectors. 

The four most commonly used classification 
procedures are discriminant analysis, kernel density 
estimation, and kth nearest neighbor, and tree -structured 
classification. Discriminant analysis assumes that all the 
measurement vectors are distributed multivariate normal, and 
thus is not set up to handle categorical variables (See 
Gnanadesikan, R. Methods for statistical data analysis of 
multivariate observations (Wiley, New York (1977)). Even 
though kernel density estimation and kth nearest neighbor 
methods make minimal assumptions about the form of the 
underlying distribution of the measurement vectors, there are 
still serious limitations common to both methods. They 
require a definition of a distance measure (metric) among the 
measurement vectors; performance of these classifiers is 
sensitive to the choice of the metric. There is no natural or 
simple way to handle categorical variables. Both are 
computationally expensive as classifiers because they require 
that the learning sample be stored, and the distances and 
classification rule be recomputed for each new undetermined 
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case. Most seriously/ they give very little usable 
information about the data structure. Kanal, L. (1974) IEEE 
Trans . Information Theory 11-20:697-722, and Hand, D.J. 
Discrimination and Classification (Wiley, Chichester (1981) ) , 
5 give surveys of the literature on these methods. 

Tree -structured classification is a recursive and 
iterative procedure. It proceeds by repeated splits of 
subsets of the measurement space into two descendant subsets 
or nodes, beginning with the full measurement space. The 

10 fundamental approach is to select each split of a node so that 
the data in each of the descendant nodes are "purer" than the 
data in the parent node. A node impurity measure is defined 
such that it is largest when all classes are equally mixed 
together in that node, and smallest when the node contains 

15 only one class. The sequence of splits is determined such 

that at each candidate node all possible splits are examined 
and the split that produces the largest decrease in the 
impurity is selected. The terminal nodes form a partition of 
the measurement space. Each terminal node is designated by a 

20 class assignment based on the observed proportions of the 

classes in that partition. (Usually, the assignment is the 
class with the highest proportion.) There may be two or more 
terminal nodes with the same class assignment. The partition 
corresponding to that class is obtained by putting together 

25 all terminal nodes corresponding to the same class. The tree 
classifier predicts a class for a given measurement vector in 
the following way: From the definition of the first split, it 
is determined whether the measurement vector goes to the right 
or to the left. This is repeated until the case moves into a 

3 0 terminal node. The predicted class is then given by the class 
assignment of that terminal node. 

The optimal size of a classification tree is 
determined in the following manner: continue the splitting 
until all terminal nodes are very small, resulting in a large 

35 tree. This large tree is then selectively pruned upward, and 
thus producing a decreasing sequence of subtrees. Finally, 
use cross-validation or test-sample estimates to choose from 
the sequence of subtrees that subtree having the lowest 
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estimated misclassif ication rate. The tree -structured 
classification methodology is covered in detail in Breiman et. 
al. Classification and Regression Trees , (Wadsworth 
International Group, Belmont, California (1984)). 
5 The tree -structured approach is a powerful and 

flexible classification tool. It can handle both numerical 
and categorical variables in a simple and natural way. The 
final classification has a simple form that can be efficiently 
used to classify new data. It does automatic stepwise 
10 variable selection and complexity reduction. It provides both 
the classification and the estimate of the misclassif ication 
probability for a new case. The output of the tree procedure 
gives easily understood and interpreted information regarding 
the predictive structure of the data. 

15 

Example of a Tree- Structured Classifier 

Figure 11 displays a hypothetical six-class tree 
{numbers under boxes) . The boxes represent nodes. Node tl 
contains the whole measurement space and is called the root 

20 node. Nodes t2 and t3 are disjoint with tl being the union of 
t2 and t3. Similarly t4 and t5 are disjoint and t2 is the 
union of t4 and t5. Those nodes that are not split, in this 
case, t6, tS, tlO, til, t!2, tl4, t25, tiff, and til are called 
terminal nodes. The numbers beneath the terminal nodes are 

25 the class assignments or class labels for this particular 
classifier. 

Let x be a measurement vector (in our case a DNA 
sequence of length K) ; x = (x lf x 2 , . . . , x K ) . The splits are 
formed by setting conditions on the coordinates of x. For 

30 example Split 1 of tl into t2 and t3 could be of the form: t2 
is the set of measurement vectors such that x 10 = {A, C) and t3 
is the set of measurement vectors such that x 10 = {G, T) . 

This classifier predicts a class for the measurement 
vector x in this way: From the definition of the first split, 

35 it is determined whether x goes into t2 or t3 . For example, 

if the above definition for Split 1 is used, x goes into t2 if 
the 10th nucleotide in that sequence is either A or C, 
otherwise, x goes into t3 . If x goes into t2, then from the 
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definition of Split 2, it is determined whether x goes into t4 
or t5, and so on. When x finally moves into a terminal node, 
its predicted class is given by the class label attached to 
that terminal node. 

5 

EXAMPLES 

Mycobacterium tuberculosis xpoB chip 

A high density oligonucleotide array (Mtb rpoB chip) 
has been synthesized and tested in preliminary experiments. 

10 The chip has been synthesized using 2 lengths of 

oligonucleotides (18 and 20 mers) with the interrogation 
position located at bases 9 and 10 (sense and antisense 
probes) and 10 and 11 (sense and antisense probes) . The Mtb 
rpoB chip was used initially to genotype 15 M. tuberculosis 

15 clinical isolates which were previously determined to be RIF 
sensitive. Figure 12 is an image of the Mtb rpoB chip 
analysis of the 700 bp region of the rpoB gene from one of 
these isolates. Oligonucleotide primer sequences, PCR 
amplification, in vitro transcription and hybridization to the 

20 chip conditions were as follows. 

Chromosomal DNA from M. tuJberculosis was isolated by 
suspending one colony in 100 pi of ddH 2 0, boiling for 10 
minutes and briefly centrifuging to separate the DNA solution 
from cellular debris. The chromosome DNA was then diluted 

25 1:10 in ddH 2 0. A 705 bp rpoB fragment was amplified in a 100 
jxl reaction volume containing each dNTP at 200 /xM, each primer 
at 0.2 j*M, 2.5 U of Taq-polymerase (BM, Indianapolis, IN), 10 
mM Tris (pH 8.3), 50 mM KCl, 1.5 mM MgCl 2 . The amplification 
was carried out in a model 9600 thermocycler (Perkin Elmer 

30 Cetus) . To amplify the 705 bp fragment using primers rpoB-4 
(CTC GGA ATT AAC CCT CAC TAA AGG GAC CCA GGA CGT GGA GGC GAT 
CAC ACC GCA) (SEQ ID N0:1) and rpoB-7 (TAA TAC GAC TCA CTA TAG 
GGA GAC GTC GCC GCG TCG ATC GCC GCG C) (SEQ ID NO: 2) with 
incorporated T3 and T7 sequences 5 min 95°C, 35 cycles of 1 

35 min 95°C, 30 sec 68°C and 2 min at 72°, followed by 10 min of 
72° were used. The PCR amplicon was then purified using 
Amicon Microcon 100 columns. In vitro transcription of 
approximately 50 ng amplicon was performed in a reaction 
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volume of 20 jxl, containing 1.25 mM rNTPs , 10 mM DTT, 125 /zm 
F-CTP, 20 U RNase inhibitor, 40 mM Tris-HCl (pH7.5), 6 mM 
MgCl 2/ 2 mM spermidine, 10 mM NaCl, 20 U T3/T7 RNA-polymerase 
for 90 min at 37°C. The RNA was then purified with Amicon 
5 Microcon 100 columns and quantitated using a 

spectrophotometer. Fragmentation was carried out in 30 mM 
MgCl 2 at 95°C for 30 min. A 20 nM RNA solution in 6xSSPE, 20% 
formamide, 0.005% Triton, 0 . 5 nM control oligo was heated to 
68 °C for 10 min, then placed on ice for 5 min and hybridized 

10 to the Mtb rpoB chip for 3 0 min at 22 °C in the Affymetrix 

Fluidics Station. The post-hybridization wash was performed 
with lxSSPE, 20% formamide, 0.005% Trition x 100 in the 
Affymetrix Fluidics Station (Affymetrix, Santa Clara, CA) , 
using 12 wash cycles with 2 fills and drains per cycle, 

15 followed by a wash with 6xSSPE, 0.005% Triton, 2 cycles with 2 
fills and drains per cycle. The chip was then scanned on a 
Molecular Dynamics scanner (Molecular Dynamics, Santa Glara, 
CA) at 22°C with a resolution of 11.25 pixels//zm. 

As noted, 20% formamide was used in both 

20 hybridization and post-hybridization wash steps since the 700 
bp amplicon is 67.7% G:C rich with a 18 bp region which is 
73.3% G:C rich. The results from the analysis by the Mtb rpoB 
chip indicated that there were no polymorphisms at any base of 
the 700 bp for any of the 15 M. tuberculosis RIF sensitive 

25 isolates analyzed. This result was confirmed by conventional 
dideoxynucleotide sequencing. Thus, both methods were 100% 
concordant in the analysis of 10,500 nucleotides of total 
sequence. 

30 Detection of mutations conferring RIF resistance 

Four pre-resistant/post-resistant RIF isolates were 
screened in a blinded fashion. These were analyzed using this 
first generation Mtb rpoB chip. Table 2 summarizes the 
results of the chip analysis. Of the 4 pair isolates, three 

35 pair were observed to have one member of the pair which 

possessed mutations in the 81 bp region (all other nucleotides 
were wild type) , with the companion isolates displaying no 
such mutations. Interestingly, the fourth pair 
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(001415/001417) contained no mutation at any nucleotide of the 
700 bp surveyed, although isolate 001417 was characterized as 
RIF resistant by culture assay. Since 10% of RIF resistant M. 
tuberculosis isolates have no mutation in the 81 bp region of 
rpoB, this isolate may be resistant because of a mutation in 
the portion of rpoB not analyzed by the chip or because of a 
mutation in some other gene which controls uptake, metabolism 
or drug binding. The sequences derived using the chip for all 
8 isolates were confirmed using dideoxynucleotide sequencing. 
An additional 4 RIF resistant isolates were also screened. 
Mutations only within the 81 bp region were detected for each 
of these isolates by the Mtb rpoB chip and confirmed by 
dideoxynucleotide sequencing. A total of 25 M. tuberculosis 
isolates were analyzed. Seven of these were rifampicin 
resistant and had the mutations shown in Table 2. Other than 
the mutations shown in Table 2, there were no polymorphisms in 
any of the 25 isolates. 

Table 2 

RIF Sensitive and Resistant M. tuberculosis Clinical Samples Analyzed 
by Mtb rpoB Chip and Confirmed by Did 



Sample 


Amino Acid 1 * 


Nucleotide 1 * 


Phenotypic Ri 


M0404A 


None 


None 


No 


000936 


S456L 


TCG->TTG 


Yes 


00145 


None 


None 


No 


001417 


None 


None 


Yes 


000914 


None 


None 


No 


001231 


S456L 


TCG->TTG 


Yes 


001587 


H451D 


CAC->GAC 


Yes 


SM2341 


None 


None 


No 


3407 


H451Y 


CAC->TAC 


Yes 


978 


.H451L 


CAC->CTC 


Yes 


3553 


S456L 


TCG->TTG 


Yes 


3466 


S447L 


TCG->TTG 


Yes 



Amino acid and nucleotide numbering system employs sequencing derived by Miller, et. al M (1993). 
Resistance was determined using relative/proportion method of Small et a!., "Tuberculosis: Pathogenesis, 
Protection, and Control" pp. 569-586 (1994) (Am. Soc. Microbiol. , ed. B.R. Bloom) 
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Single Nucleotide Polymorphisms and Hybridization Pattern 
(Fingerprint) Database for Non-Tuberculosis Mycobacteria. 

The first steps in assembling a database (consisting 

of SNPs and chip based hybridization fingerprints) capable of 

5 identifying mycobacteria species were taken with the analysis 

of 7 clinically important Mycobacteria species: M. gordonae, 

M. chelonae, M. kansasii, M, scrofulaceum, M. avium, M. 

intracellular and M. xenopi. As a first step, the 700 bp . 

region of the rpoB gene from one isolate from each of these 

10 species was sequenced using dideoxynucleotide methodology. 
Nucleotide (60-71) and amino acid (5-8) differences were 
compared to M. tuberculosis within the 700 bp region for each 
of these mycobacteria species (Table 3) . Two types of single 
nucleotide polymorphisms (SNP) were noted: species specific 

15 (unique) and shared. The SNPs which were shared with at least 
3 other non- tuberculosis mycobacteria are numerous and 
scattered throughout the 700 bps analyzed. Fig. 13 
illustrates the location of these shared SNPs. The species 
specific SNPs are, however, considerably fewer. Figure 14 

20 depicts the location and nature of the SNPs for each of the 7 
species analyzed based on one isolate for each species. 

Table 3 

Comparison of Polymorphisms in Mycobacteria Species 
25 Number of polymorphic 



Strain 


Nucleotides 




AA - changes 






Total 


Unique 


Total 


Unique 


M. gordonae 


71 


19 


5 


1 


M. chelonae 


62 


i 


5 


1 


M. avium 


60 


10 


5 


0 


M. kansasii 


67 


17 


6 


4 


M. scrofulaceum 


63 


1 


6 


1 


M. xenopi 


72 


27 


8 


7 


M. intracellulare 


60 


7 


6 


2 



35 

When fluorescent ly- labeled RNA amplicons from each 
of the 7 mycobacteria species were hybridized to the Mtb rpoB 
chip, the image of the hybridization is considerably different 
then when an amplicon for M. tuberculosis was hybridized (Fig. 
40 ISA) . The differences in the hybridization patterns can be 
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represented as a bar-code-like fingerprint (Fig, 15B) . Each 
line on the fingerprint represents a hybridization difference 
as compared to when wild type AT. tuberculosis is hybridized. 
These differences are attributable to the species specific and 
5 shared polymorphisms identified by the dideoxynucleotide 
sequencing analysis (Table 3). For any individual 
Mycobacteria species, only some of the differences depicted by 
an individual line of the fingerprint is identifiable as a 
specific base pair difference. The remainder of the lines of 

10 the fingerprint can be characterized only as being different 
than if a M. tuberculosis sample was hybridized. These 
undefined differences are usually caused when multiple 
polymorphisms occur in close proximity or they are the result 
of the destabilization of the hybridization of neighboring 

15 probes due to the presence of unique or shared polymorphisms. 
In such clustered polymorphism cases, there are multiple 
mismatches within individual probes interrogating nucleotides 
in a region. Hybridization results involving such probes are 
unstable leading to ambiguous, wrong or no calls. Thus, a 

20 full fingerprint pattern is composed of identified (unique or 
shared polymorphisms) and unidentified (clustered 
polymorphisms leading to no base calling determination) 
differences. An average of 27.7% of the 700 bps interrogated 
are viewed as different than if a M. tuberculosis target were 

25 hybridized to the chip (Table 4) . In other words, for every 
base identified by ABI sequencing as being polymorphic, the 
chip sees three bases as different from Mtb. This is due to, 
in addition, each of the two bases flanking the base 
identified by ABI as polymorphic also being viewed as 

30 different by the chip, because of destablization of 
hybridization at these sites. 
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Table 4 

Differences of Fingerprint Patterns Among 
Mycobacteria Species Compared to Af. tuberculosis 



Nucleotide Differences 1 ' % Differences 2 * 

M. gordonae 188 26.7 

M. chelonae 208 29.5 

M. avium 152 21.5 

M. kansasii 216 30.6 

M. scrofulaceum 213 30.2 

M. xenopi 229 32.4 

M. intracellular 164 23.2 



1 The nucleotide differences are composed of identified differences compared to the M. tuberculosis sequence 
(species specific and shared polymorphisms) and unidentified differences (caused by clustered polymorphisms). 

2 % differences are based on a total of 700 bp analyzed by Mtb rpoB gene on the chip. 

Since the database for each of the non- tuberculosis 
mycobacteria was the result of analysis of only a single 
isolate for each Mycobacteria species, the variation of 
fingerprint patterns that would be observed among multiple 
isolates of a single Mycobacteria species was explored. 
Consequently, the rpoB gene from 10 isolates of M. gordonae 
were analyzed by the Mtb rpoB chip. Figure 16 presents the 
images of the sense strand hybridization. Below each chip 
image is the hybridization fingerprint computed after analysis 
of both strands. The shared differences among the 11 {10 new 
and 1 original) isolates analyzed are shown below {Table 5) . 
From this analysis a core (consensus) fingerprint pattern for 
M. gordonae was derived (Figure 16) . A similar core 
fingerprint has been derived for eight other Mycobacterium 
species, thus allowing identification of those species. It 
will be apparent that the techniques described above can be 
used to assemble a database of species -specif ic and shared 
polymorphisms which can be used to derive fingerprints for 
other species. 

It is important to note that as the total number of 
analyzed isolates for each species is increased, it is 
unlikely that a single and unique core fingerprint will define 
a Mycobacterium species. Rather, it is expected that any 
particular isolate of a Mycobacterium species will have a 
subset of all possible fingerprints. Identification of the 
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Mycobacterium species based on a fingerprint pattern will 
require a classification analysis, as described earlier, using 
the tree-based classification algorithms built upon a 
collected database consisting of species specific and shared 
SNPs and fingerprints. 

Table 5 

Percent Shared Differences Among Gordonae Clinicals 





golz 


gord 


gordjd 


gordib 


gordig 


gordil 


gordmb 


gordow 


gordrb 


gordwn 


golz 


0 




















gord 


18.7 


0 


















gordjd 


16.7 


17.4 


0 
















gord lb 


19.3 


22.7 


18.3 


0 














gordig 


23.7 


18.7 


17.3 


19.6 


0 












gordil 


18.3 


14.6 


15.2 


15.5 


17.9 


0 










gordmb 


19.7 


22.4 


18.4 


24.3 


19.7 


15.3 


0 








gordow 


19.6 


16.3 


16.6 


17.0 


19.6 


18.7 


16.7 


0 






gordrb 


20.6 


23.3 


18.9 


24.5 


20.6 


16.2 


25.0 


18.4 


0 




gordwn 


17.6 


17.4 


16.9 


17.9 


16.9 


15.2 


18.0 


16.2 


19.0 


0 


gorm 


20.1 


22.7 


19.1 


24.0 


20,9 


16.2 


24.3 


17.6 


25.0 


18.2 



Human Mitochondrial DNA Chip (MT1) 

Fluorescein labelled target RNA was synthesized and 
fragmented, and the transcription mixture diluted 20-fold in 
6xSSPE, 0.05% Triton X-100, to give approximately 1 to 10 
nMRNA {estimated prior to fragmentation) . Hybridization was 
carried out for 30 min at RT. The chip was washed for a. total 
of 5 to 10 minutes in several changes of 6xSSPE, 0.005% Triton 
X-100, and scanned. 

Fig. 23A shows the design of the tiled array on the 
MTI chip. Each position in the target sequence (upper case) 
is interrogated by a set of 4 probes on the chip (lower case) , 
identical except at a single position, termed the 
interrogation base, which is either A, C, G, or T. The target 
will be perfectly complementary to one of the 4 probes, but 
mismatched with the others. As illustrated in Fig. 10, the 
perfect complement gives a more intense hybridization signal 
than do the mismatches. Each of the lower three probes 
represents a 4 probe set, with n = A, C, G, or T. By tiling 
the sets across the sequence in single base increments as 
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shown, a nucleic acid target of length N can be scanned for 
mutations with a tiled array containing 4N probes, <B) 
Hybridization to a tiled array and detection of a point 
mutation. The array shown was designed to the MT1 target 
5 sequence. When hybridized to MTI (upper panel), one probe in 
each set of 4 in a column is perfectly matched to the target, 
while the other three contain a single base mismatch. The 
interrogation base used in each row of probes is indicated on 
the left of the image. The target sequence can be read 5' to 

10 . 3' from left to right as the complement of the interrogation 
base with the brightest signal. Hybridization to MT2 (lower 
panel) , which differs from MTI in this region by a T -> C 
transition, affects the probe sets differently. At the 
location of the polymorphism, the G probe is now a perfect 

15 match to MT2 , with the other three probes having single base 
mismatches. (A*, C*, G* , T* counts). However, at flanking 
positions, the probes have either single or double base 
mismatches, since the MT2 substitution now occurs away from 
the interrogation position. The location of the mismatch is 

20 illustrated in the probe schematic by red circles. 

Detection of base differences of base differences between a 
sample and reference sequence in 2.5 kb by comparison of 
scaled p 030 * 9 hybridization intensity patterns between a sample 

25 and a reference target 

In this study, each 2.5 kb target sequence was PCR 
amplified directly from genomic DNA using the primer pair 
L14675-T3 ( 5 1 aattaaccctcactaaagggATTCTCGCACGGACTACAAC) (SEQ ID 
NO: 7) and H667-T7, transcribed to give RNA targets labelled 

30 with fluorescein or biotin, pooled and fragmented as 

described. In the experiments shown the MTI reference target 
was biotin labelled and the sample target fluorescein 
labelled. Targets were diluted 180 fold from the 
transcription reaction to a final concentration of - 100 to 

35 1000 pM in 3 M TMAC1, 10 MM Tris.Cl pH 8.0, ImM EDTA, 0.005% 
Triton X-100, and 0.2 nM of a control oligonucleotide, 51 
f luore s ce in - CTGAACGGTAGCATCTTGAC (SEQ ID NO: 8). (We found that 
the G-rich H strand target hybridized poorly in 1 M NaCl, but 
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hybridized well in 3 M tetramethyl ammonium chloride, whereas 
the L strand hybridized well in either solution) . 
Hybridization was carried out in packaged chips. Samples were 
denatured at 95°C for 5 min. chilled on ice for 5 min. and 
5 equilibrated to 37°C. A volume of 180 gl of hybridization 
solution was then added to the flow cell and the chip 
incubated at 37°C for 3 h with rotation at 60 rpm on a 
laboratory rotisserie. Following hybridization, the chip was 
washed 6 times at RT with 6xSSPE, 0.005% Triton X-100. A 

10 solution of 2 gg/ml phycoerythrin- conjugated streptavidin in 
6xSSPE, 0.005% Triton X-100, was added, and incubation 
continued at RT for 5 min. The chip was washed again, and 
scanned at a resolution of - 74 pixels per probe cell. Two 
scans were collected, one using a 530 DF 25 nm bandpass 

15 filter, and the second using a 560 nm longpass filter. 

Signals were deconvoluted to remove spectral overlap and 
average counts per cell determined. The sample probe 
intensities were scaled to the reference intensities as 
follows: a histogram of the base-10 logarithm of the intensity 

20 ratios for each pair of probes was constructed. The histogram 
had a mesh size of 0.01, and was smoothed by replacing the 
value at each point with the average number of counts over a 
five-point window centered at that point. The highest value 
in the histogram was located, and the resulting intensity 

25 ratio was taken to be the most probable calibration 
coefficient. 

The data are shown in Fig. 25 for L strand targets 
hybridized to H strand probes, from a portion of hypervariable 
region I in the mitochondrial control region. Numbering is 

3 0 conventional. In each plot, the reference target intensities 
are shown in red and the sample in blue. The reference, MT1, 
is a perfect match to the p° probes. Fig.25A - Comparison of 
ief007 to MT1. There is a single base difference between the 
two target sequences, located at position 16,223 (MT1 C: 

35 ief007 T) . This results in a "footprint" spanning - 20 
positions, 11 to the left and 8 to the right of position 
16,223, in which the ief007 p° intensities are decreased by a 
factor of more than 10 fold relative to the MT1 intensities. 
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The size and location of the footprint are consistent with a 
single base mismatch affecting hybridization to p 20 * 9 probes. 
The theoretical footprint location is indicated by the grey 
bar, and the location of the polymorphism is shown by a 
vertical black line within the bar. The size of a footprint 
changes with probe length, and its relative position with 
interrogation position (not shown) . Because the sample and 
reference targets are in competition, the MT1 signal in a 
footprint region actually increases as a proportion of total 
signal in each probe cell, because the mismatched sample 
target no longer competes effectively for probe sites. 
Fig.25B - Comparison of haOOl to MT1. The haOOl target has 4 
polymorphisms relative to MT1 . The p° intensity pattern 
clearly shows two regions of difference between the targets. 
Furthermore, it can easily be seen that each region contains >. 
2 differences, because in both cases the footprints are longer 
than 20 positions, and therefore are too extensive to be 
explained by a single base difference. The effect of 
competition can be seen by comparing the MT1 intensities in 
the iefOO? and haOOl experiments: the relative intensities of 
MT1 are greater in panel B where haOOl contains po mismatches 
but ief007 does not. Fig. 25C - The ha004 sample has multiple 
differences to MT1, resulting in a complex pattern extending 
over most of the region shown. Thus, differences are clearly 
detected, even though basecalling might be compromised using 
only the 4N tiling array. Even when patterns are highly 
complex, samples can be compared and matched. For example, if 
the ha004 sample is compared to ha004 as a reference in the 
same experiment, the p° pattern indicates a match, even though 
the effect of multiple mismatches might compromise direct 
sequence reading (not shown) . 

Detection of a 2 bp-deletion 

Experimental details are as above. The results are 
shown in Figure 26. Figure 26A shows that although the 4N 
array was not designed to detect length polymorphisms, this 
common 2 bp length polymorphism located at 514-523 in the MT 
DNA was easily detected by the presence of a po intensity 
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footprint. Figure 26B shows target-specific effects on 
hybridization. A 2 bp difference between the MT1 reference 
(GG) and the ha002 target (AA) is associated with a complex 
footprint pattern: the po signals of the mismatched ha002 
hybrids are up to 10 -fold higher than those of the perfectly 
attached mtl hybrids in a region extending leftwards from 
position 16,381. Both samples were hybridized simultaneously 
to the same array. In addition, the effect extends beyond the 
probes directly affected by the mismatches. Therefore, we 
conclude that the difference is due to changes in target 
secondary structure. Increased accessibility of the target as 
a result of disruption of base pairing between inverted 
repeats (shaded on the diagram) could explain the increase in 
the sample versus the reference p° signals. 

Hybridization of a 16.3 kb of mitochondrial target sequence to 
the whole genome chip 

Figure 27A shows an image of the array, actual size 
hybridized to L strand target sample. The 1.28 x 1.28 cm, p 20 ' 9 
tiled array contains a total of 134,688 probes, each 
synthesized in a 35 x 35 micron cell. The number of probes is 
sufficient to represent the 16.6 kb genome twice over. The 
array has the capacity for sense and antisense coding. The 
16,569 bp map of the genome is shown and the H strand origin 
of replication (OH) ' located in the control region, is 
indicated. Figure 2 7B - A portion of the hybridization 
pattern is shown, magnified. The scale is indicated by the 
bar on the left hand side. Most of the array can be read 
directly. The image, which was generated by the galvanometer 
scanner detection system in under 2 minutes, was collected at 
-3 micron, 16 bit pixel resolution, providing - 100 pixels of 
intensity data for each probe cell in the array. Fluorescence 
was detected through a 581 Df 52 nm. bandpass filter. Figure 
27C - The ability of the array to detect and read single base 
differences in a 16.3 kb sample is illustrated. Two different 
target sequences were hybridized in parallel to different 
chips. The hybridization patterns are compared for four 
different positions in the sequence. The top panel of each 
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pair shows the hybridization of a the MT3 target, which 
matches the chip po sequence at these positions. The lower 
panel shows the pattern generated by a sample from a patient 
with Leber's Hereditary Optic Neuropathy (LHON) . Three 
pathogenic mutations are implicated, LHON3460, LHON4216, and 
LHON13708. All three are clearly detected. For comparison, 
the third panel in the set shows a region that is identical in 
both samples, around position 11,778. 

The pattern matching techniques described above also 
provide a method of determining whether the nucleic acid 
sequence of a biological sample is homozygous or heterozygous 
for a particular allele, i.e., to identify the presence of a 
polymorphism in, the nucleic acid sequence at a particular 
position. In this regard, polymorphisms can be identified in 
both coding and noncoding sections of the sample nucleic acid, 
i.e., in exons or introns. This is of value, for example, in 
identifying whether a genetically linked disease is present. 
Of course, it will be recognized that any genetically related 
condition, i.e., other than those thought as "diseases" can be 
identified by such a method. 

Fig. 34 shows a computer- implemented flowchart of a 
method of identifying the presence of a polymorphism in a 
nucleic acid sequence from a patient sample. The flowcharts 
described herein are for illustration purposes and not 
limitation. For example, for simplicity Fig. 34 describes 
analyzing one base position at a time to detect polymorphisms. 
However, each step may also be performed for the entire 
nucleic acid sequences and/or some steps may be combined. 

At step 200, the system selects a base position in 
the nucleic acid sequence from the patient sample. The system 
determines the difference between the hybridization 
intensities of the nucleic acid sequence from the patient 
sample and a corresponding nucleic acid sequence from a wild 
type sample to an array of reference nucleic acid probes at 
step 202. Although the reference nucleic acid probes may 
perfectly complementary to the wild type sample, this is not 
required. 
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The. system derives or calculates a ratio of the 
difference determined at step 202 to the hybridization 
intensity of the nucleic acid sequence from the wild type 
sample. The ratio is derived at step 204 and it indicates how 
close the hybridization intensities for the nucleic acid 
sequences from the patient wild type samples are to being the 
same. 

An assigned value is utilized to determine if the 
ratio indicates that there is a polymorphism at the base 
position. The assigned value may be user specified and at 
step 206, the system compares the ratio to the assigned value. 
If the ratio is greater than the assigned value, the system 
identifies the presence of a polymorphism at the base position 
of the nucleic acid sequence from the patient sample. At step 
210, the system determines if there is a next base position to 
analyze . 

By way of example and not limitation, one can screen 
nucleic acid samples from a cancer patient to determine 
whether the DNA repair genes MSH or MLHI are mutated. This is 
done by comparing the hybridization pattern of patient DNA 
from the appropriate region to the hybridization pattern of 
DNA from the same region of a healthy (i.e., wild-type) 
sample. Figures' 28-31 show such comparisons of patient DNA 
samples from heterozygous MSH2, MLHI, MSH2 and p53 genes and 
their corresponding wild type genes. The screening can be 
against any reference sequence immobilized on the chip, though 
as described earlier it will be advantageous to use a chip in 
which the reference sequence is complementary to the wild type 
sequence. The hybridization intensities corresponding to each 
base position is determined for each sample as described 
earlier. One then determines the difference between the 
intensities for the patient sample and the wild type sample at 
each base position and compares that to the wild type 
intensity at that base position. This ratio can vary between 
one and zero, being zero if the wild type and patient 
sequences are identical in this region (since hybridization 
will be identical for both samples) and approaching one if 
there is a complete mismatch, i.e., no hybridization at all 
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between patient sample and the reference sequence in that 
region. If this ratio is greater than an assigned value, it 
indicates a polymorphism at that particular base position. 
Typically, this assigned value is set at about 0.5, preferably 
0.6. Positions at which such polymorphisms are present can be 
identified by plotting this ratio versus the corresponding 
base position for all positions where the ratio is greater 
than about 0.25. If the ratio is less than 0.25, this is 
considered to be statistical noise. Typically such plots will 
show a spike, with a maximum ratio of about 0.5, centered 
approximately around the site of the polymorphism. The plots 
are made with variables derived as follows: 

Y axis: y = (WT intensity - PS intensity)/ WT intensity 
X axis: x = base position 

where WT = wild type and PS = patient sample 

The technique has been refined further to provide a 
higher level of accuracy by determining hybridization 
intensities from both the sense and antisense strands of the 
DNA sample and requiring that the spike occur in both strands 
at the same respective complementary positions. The probes on 
the chip are typically 10-20 mers and therefore create a 
"footprint- as one tiles through the position where the 
polymorphism is present, i.e., there will be a difference 
between the hybridization intensities of the patient sequence 
• and the wild type sequence in this region. As a result, an 
even higher level of confidence that a polymorphism occurs at 
a particular base position is obtained by requiring that the 
hybridization intensity ratio derived above be greater than 
the assigned value, 0.5 in this example, for at least two 
adjacent base positions, preferably three adjacent positions. 

Mismatch Detection by Tiled Arrays 

In this example, a reference target TO and three 
mutant targets Tl, T2, and T3 are provided. Tl has a 
substitution at position 11, T2 at position 9, and T3 at 
positions 9 and 11. In writing the mutant sequences, the 
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position of the substitution is noted by S, 
are depicted in Table 6. 



These sequences 



Table 6 

Substitutions in Mutant Sequences 



10. 



15 



20 



25 



TO 
Tl 
T2 
T3 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

S 

S 

S S 



Each of these targets is hybridized with a DNA chip 
containing a tiled arrays of probes. For simplicity, a p 7 4 
chip is described herein. The superscript 7 indicates that 
the chip contains a tiled array of 7-mer probes that are 
perfectly or partially complementary to the reference target. 
The subscript 4 denotes the interrogation position, such that 
the nucleotide at position 4 of each 7-mer is varied (A, T, G, 
or C in four different synthesis cells) . 

The number of target -probe mismatches is given in 
Table 7 below. The top row gives the number of mismatches for 
the best -match case (i.e., for the most complementary probe of 
each set of four) and the second row gives the number of 
mistmatches for the other three probes in the set. 

Table 7 
Target Probe Mismatches 



30 



35 



40 



TO 

versus 
TO 

versus 
Tl 

versus 
T2 

versus 
T3 



x 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

00 000 0 000000000000000 
lllllllllllllllll 1111 

000000011101110000000 
111111122212221111111 

000011101110000000000 
111122212221111111111 

000011112211110000000 
111122 2 23322221111111 
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Shown below in Table 8 are the number of mismatches 
in the best-match case for the hybridization of p 7 4 with a 
series of targets containing two substitutions at different 
separations: 

Table 8 

Best Match Hybridization 

T0 ! 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

S S 

12211221 

S S 
112121211 

S S 
1111221111 

S S 
11102220111 

S s 

111012210111 

S s 
1110112110111 



When a target is hybridized to a 7-mer chip P 7 
containing the tiled reference sequence, the number of 
mismatches in this case is the same as that given by the best- 
match case above except for an additional mismatch at each 
substitution position (See Table 9). For T3, for example, P 7 
has one more mismatch at positions 8 and 11, where T3 has 
substitutions, than does P 7 4 . 

Table 9 

Mismatches at Substitution Position 

P 7 s s 

versus 0 0001112 2221110 000000 
T3 
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Thus, to generalize, for a P n j chip, and a target 
containing a substitution at position a, the best-match set of 
probes will contain 1 mismatch from a-1 to a-k, 0 at a, and 1 
mismatch from a+1 to a+rn, where k-j-1 and m=n- j . For P 7 4 , k=3 
5 and m=3, and so the 1 mismatch zone is from a-1 to a-3, and 
from a+1 to a+3, with no mismatch at a, the interrogation 
position. 

The effects of multiple substitutions are additive. 
Thus, for example, using a P 1€ 10 chip and a target containing 

10 substitutions at positions a and b, where k is 9 and m is 6, 

the effect of the substitution at a is to give 1 mismatch from 
a-1 to a-9, and from a+1 to a+6 . The substitution at b will 
give 1 mismatch from b-1 to b-9, and from b+1 to b+6. If a 
and b are at positions 100 and 108, their effect is the sum, 

15 as shown in Table 10. 

Table 10 
Effects of Multiple Substitutes 

Position Mismatches 

91-98 1 

99 2 

100 1 

101-106 2 

107 1 

108 0 
109-114 1 

Thus, given a hybridization pattern, the location of 
3 0 substitution mutations can be done as follows. 

(a) The first step is to hybridize a P n j chip with 
the reference target, and another P^ chip with the unknown 
target. Alternatively, a single P n j chip could be hybridized 
with a mixture of differently labeled reference and unknown 
35 targets (e.g., a red- fluorescent reference target and a green- 
fluorescent unknown target) . By using a pair of chips, or a 
pair of suitably labeled targets, one can readily identify 
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probes that contain mismatches and distinguish between 0, 1, 2 
and a larger number of mismatches. 

(b) . A substitution at location a is identified by 
the presence of a l-mismatch zone that is n-residues long 
except for a 0-mismatch cell at residue a. The probe giving 
the highest intensity at residue a identifies the 
substitution, 

(c) A "quiet zone" (i.e., where the unknown target 
exhibits 1 or more mismatches) that is longer than n must 
contain at least two substitutions (the effects of insertions 
and deletions are considered below) . The differences between 
P n J and P n reveal the sites of the substitutions. Again, the 
probe of P n j giving the highest intensity at each of these 
sites identifies the substitution. An example is provided in 
Table 11 below. 



Table 11 

5 6 7 8 9 10 11 12 13 14 15 16 

Target S S 

P 7 111112211111 

P\ 111012210111 

p 7 - p 7 4 000100001000 

P 7 - p 7 4 , the difference between the tiled reference 
sequence and the best-case match of the tiled array, exhibits 
l's at positions 8 and 13 and 0's elsewhere, showing that 
substitutions have occurred at these two positions. Their 
identity is established by seeing which of the four 
nucleotides at these interrogation positions has the highest 
intensity. 

(d) Further information can be obtained by 
hybridizing a generic chip, such as one containing all 10-mers 
of DNA, with both the reference target and the unknown target. 
The difference in hybridization patterns identifies probes 
that span mutation sites. 
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Identifying Species Utilizing Generic Probe Arrays 

It has been determined that generic high density DNA 
probe arrays may be utilized to identify species of isolates. 
By "generic" it is meant that the probe array was not 
specifically designed to identify species within the genus of 
interest. For example, a probe array that includes all 
nucleic acid probes ten nucleotides in length would be a 
generic probe array. Additionally, a probe array for an 
entirely different purpose may be utilized as a generic probe 
array. Thus, a probe array for detecting mutations in HIV may 
be utilized to identify species in Mycobacterium. 

Given multiple isolates that one wants to determine 
the species of each isolate, the isolates are first hybridized 
with the generic probe array to obtain hybridization 
intensities as described above. Typically, the hybridization 
intensities will then be normalized across the isolates. It 
has been determined that analyzing each hybridization 
intensity from the experiments may not be computationally 
feasible, or at least economically feasible. Accordingly, the 
invention reduces the number of variables to analyze. 

In one embodiment, for each probe in the generic 
probe array, the mean and variance for the hybridization 
intensities across the isolates is calculated. The probes 
that demonstrate the most variance are then selected and the 
corresponding hybridization intensities are utilized to 
cluster the isolates into species. Thus the invention 
' utilizes the hybridization intensities from probes that have 
the most varying hybridization intensities. One may first 
specify a number of probes which one believes could be 
processed by the equipment available. Then, the invention 
would select the hybridization intensities from that number of 
probes which will provide the most discriminating information. 

This process may be generally utilized to assign 
groups to multiple isolates, where the groups are species, 
subspecies, phenotypes, genotypes, and the like. For 
illustration purposes, the following will describe an 
embodiment that identifies the species of isolates. 
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Fig.. 36 shows a computer- implemented flowchart of a 
method of identifying species to which organisms belong. At 
step 400, hybridization intensities indicating hybridization 
affinity between multiple isolates and a generic probe array 
are input. Optionally, the hybridization intensities are then 
standardized or normalized at step 402 to reduce the 
variability between the experiments. For example, the 
hybridization intensities may be standardized to a common mean 
and variance by Z- score analysis or normalization. 

The system selects hybridization intensities that 
have the most variance across the isolates at step 404. 
Determining which hybridization intensities vary the most may 
be done any number of ways including calculating a mean and 
variance. As an example, the number of hybridization 
intensities to analyze may be reduced from 10,000 to 20 which 
drastically reduces the computational time required to analyze 
the hybridization intensities. 

At step 406, the species of each of the multiple 
isolates is determined according to the selected hybridization 
intensities. Clustering algorithms may be utilized to cluster 
the isolates into species. As an example, Principal 
Components analysis and Variable Clustering analysis may be 
utilized. The purpose of clustering is to place the isolates 
into groups or clusters suggested by the data, not defined a 
priori, such that isolates in a give cluster tend to be 
similar and isolates in different clusters tend to be 
dissimilar. Thus, no a prior classification is required. 

Isolates of Mycobacterium have been analyzed and 
Fig. 37 shows a hierarchical clustering of these isolates. 
The height of the cluster represents the average distance 
between the clusters. 

The foregoing invention has been described in some 
detail by way of illustration and example, for purposes of 
clarity and understanding. It will be obvious to one of skill 
in the art that changes and modifications may be practiced 
within the scope of the appended claims. Therefore, it is to 
be understood that the above description is intended to be 
illustrative and not restrictive. The scope of the invention 
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should, therefore, be determined not with reference to the 
above description, but should instead be determined with 
reference to the following appended claims, along with the 
full scope of equivalents to which such claims are entitled. 

All patents, patent applications and publications 
cited in this application are hereby incorporated by reference 
in their entirety for all purposes to the same extent as if 
each individual patent, patent application or publication were 
so individually denoted. 
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WHAT IS CLAIMED IS : 



1 1. A method for identifying a genotype of a first 

2 organism, comprising: 

3 (a) providing an array of oligonucleotides at known 

4 locations on a substrate, said array comprising probes 

5 complementary to reference DNA or ENA sequences from a second 

6 organism; 

7 (b) hybridizing a target nucleic acid sequence from 

8 the first organism to the array; and 

9 (c) based on an overall hybridization pattern of the 

10 target to the array, identifying the genotype of the first 

11 organism, and optionally identifying <a phenotype of the first 

12 organism. 

1 2. The method of Claim 1, wherein the second 

- 2 organism is Mycobacterium tuberculosis . 

1 3. The method of Claim 2, wherein the reference 

2 DNA or RNA sequences are selected from the group consisting of 

3 16SrRNA, the rpoB gene, the katG gene, the inhA gene, the gyrA 

4 gene, the 23SnRNA gene, the rrs gene, the pncA gene, and the 

5 rpsL gene. 

1 4 . The method of Claim 3 , wherein the phenotype is 

2 resistance to an antibiotic drug. 

1 5. The method of Claim 4, wherein the drug is 

2 selected from the group consisting of rifampacin, rifabutin, 

3 isoniazid, streptomycin, pyrazinamide, ethambutol. 

1 6. The method of Claim 1, wherein the overall 

2 hybridization pattern is derived by comparing a hybridization 

3 pattern of the target nucleic acid sequence to a hybridization 

4 pattern of the reference sequence. 



1 
2 



7. The method of Claim 6, wherein the comparing 
identifies one or more positions at which a residue in the 
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3 target nucleic acid differs from a corresponding residue in 

4 the reference sequence. 

* 1 8. The method of Claim 7, wherein the comparing is 
2 used to derive one or more sets of differences between the 

* 3 target nucleic acid and the reference sequence, each set being 

4 associated with a probability that the target belongs to a 

5 particular species of the first organism. 

1 9. The method of Claim 8, wherein the probability 

2 associated with each set of differences is used to derive a 

3 combined probability greater than a desired confidence level 

4 that the target belongs to a particular species. 

1 10. The method of Claim 8, wherein the comparing is 

2 used to derive one or more sets of differences between the 

3 target nucleic acid and the reference sequence, each set being 

4 associated with a probability that the target possesses a 

5 particular phenotype. 

1 11. The method of Claim 10, wherein the probability 

2 associated with each set of differences is used to derive a 

3 combined probability greater than a desired confidence level 

4 that the target possesses a particular phenotype. 

1 12. The method of Claim 7, wherein the comparing 

2 identifies one or more species-specific polymorphisms and 

3 these species-specific polymorphisms are used to confirm the 

4 identification. 

1 13. The method of Claim 7, wherein the comparing 

2 identifies one or more shared polymorphisms and these shared 

3 polymorphisms are used to confirm the identification. 

.1 14. The method of Claim 6, wherein the 

2 hybridization pattern of the target to a first region of the 

3 array is used to derive a probability that the target belongs 

4 to a particular species; 
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5 repeating this for other regions of the array until 

6 the combination of probabilities derived from all the regions 
7" indicating that the organism belongs to a particular species 
8 exceeds a desired confidence level. 

1 15. The method of Claim 14, wherein each region 

2 corresponds to oligonucleotide probes which detect the 

3 presence or absence of between three and fifteen contiguous 

4 residues in the target nucleic acid. 

1 16. The method of Claim 1, wherein the reference 

2 DNA or RNA sequences are from a highly conserved gene. 

1 17. The method of Claim 1, wherein the target 

2 nucleic acid is amplified from a biological sample. 

1 18. The method of Claim 17, wherein the target 

2 nucleotide is f luorescently labelled. 

1 19. The method of Claim 1, wherein the 

2 oligonucleotides are from about 5 to 25 nucleotides in length. 

1 20. The method of Claim 1, wherein the hybridizing 

2 is performed in a fluid volume of 250 nl or less. 

1 21. The method of Claim l, wherein the array has 

2 ' between 100 and 1,000,000 probes. 

1 22. The method of Claim 21, wherein the array has 

2 approximately 2,800 probes. 

1 23. The method of Claim 1, wherein the probes are 

2 linked to the support via a spacer. 



1 
2 



24. The method of Claim 1, wherein the overall 
hybridization pattern is derived by: 



t 
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3 (a) determining hybridization intensities of the 

4 target nucleic acid sequence to each of a set of selected 
5' probes; and 

6 (b) comparing said hybridization intensities to 

7 corresponding hybridization intensities of the reference 

8 sequence to said set of selected probes. 

1 25. The method of Claim 24 , wherein the set of 

2 selected probes interrogates a continuous segment of the 

3 reference sequence. 

1 26. The method of Claim 1, wherein the overall 

2 hybridization pattern is derived by determining the maximum 

3 hybridization intensity produced from a group of probes which 

4 interrogate a common nucleotide position of the target 

5 sequence, repeating this for other nucleotide positions in the 

6 target, and plotting the determined maximum hybridization 

7 intensities as a function of the corresponding nucleotide 

8 position being interrogated to provide a target sequence plot 

9 of hybridization intensity vs. nucleotide position. 

1 27. The method of Claim 26, further comprising 

2 repeating the steps of Claim 37 with the target sequence 

3 replaced by the reference sequence, to derive a baseline plot 

4 of the reference sequence and comparing the target plot to the 

5 baseline plot. 

1 28. The method of Claim 27, wherein the common 

2 nucleotide positions form a continuous segment. 

1 29. A method for identifying the genotype and/or 

2 phenotype of an organism by comparing a target nucleic acid 

3 sequence from a first organism coding for a gene (or its 

4 complement) to a reference sequence coding for the same gene 

5 (or its complement) from a second organism, said method 

6 comprising: 

7 (a) hybridizing a sample comprising the target 

8 nucleic acid or a subsequence thereof to an array of 
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9 oligonucleotide probes immobilized on a solid support, the 

10 array comprising: 

xl - a first probe set comprising a plurality of probes, 

12 each probe comprising a segment of nucleotides exactly 

13 complementary to a subsequence of the reference sequence, the 

14 segment including at least one interrogation position 

15 complementary to a corresponding nucleotide in the reference 

16 sequence; 

17 (b) determining which probes in the first probe set 

18 bind to the target nucleic acid or subsequence thereof 

19 relative to their binding to the reference sequence, such 

20 relative binding indicating whether a nucleotide in the target 

21 sequence is the same or different from the corresponding 

22 nucleotide in the reference sequence; 

23 (c) based on differences between the nucleotides of 

24 the target sequence and the reference sequence identifying the 

25 phenotype of the first organism; 

26 (d) deriving one or more sets of differences 

27 between the reference sequence and the first organism; and 

2 8 (e) comparing the set of differences to a data base 

29 comprising sets of differences correlated with speciation of 

30 organisms to identify the genotype of the first organism. 

1 30. The method of Claim 29, wherein the second 

2 organism is Mycobacterium tuberculosis. 

! 31. The method of Claim 29, wherein the gene is 

2 selected from the group consisting of 16SrRNA, the rpoB gene, 

3 the JcatG gene, the inhA gene, the gyrA gene, the 23SnRNA gene, 

4 the rrs gene, the pncA gene, and the rpsL gene. 

1 32. The method of Claim 29, wherein the phenotype 

2 is resistance to an antibiotic drug. 

1 33. The method of Claim 32, wherein the drug is 

2 selected from the group consisting of rifampacin, rifabutin, 

3 isoniazid, streptomycin, pyrazinamide, ethambutol. 
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1 34. The method of Claim 29, wherein the reference 

2 DNA ord RNA sequences are from a highly conserved gene. 

1 35. The method of Claim 29, wherein each set of 

2 differences is associated with a probability that the target 

3 belongs to a particular species of the first organism. 

1 36. The method of Claim 35, wherein the probability 

2 associated with each set of differences is used to derive a 

3 combined probability greater than a desired confidence level 

4 that the target belongs to a particular species. 

1 37. The method of Claim 29, wherein the comparing 

2 identifies one or more species-specific polymorphisms and 

3 these species-specific polymorphisms are used to confirm the 

4 identification. 

1 38. The method of Claim 29, wherein the comparing 

2 identifies one or more shared polymorphisms and these shared 

3 polymorphisms are used to confirm the identification. 

1 39. The method of Claim 29, wherein the target 

2 nucleic acid is amplified from a biological sample. 

1 40. The method of Claim 39, wherein the target 

2 nucleic acid is f luorescently labelled. 

1 41. The method of Claim 29, wherein the 

2 oligonucleotides are from about 5 to 25 nucleotides in length. 

1 42. The method of Claim 29, wherein the hybridizing 

2 is performed in a fluid volume of 250 /xL or less. 

1 43. The method of Claim 29, wherein the array has 

2 between 100 and 1,000,000 probes. 

1 44. The method of Claim 42, wherein the array has 

2 approximately 2,800 probes. 
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45. The method of Claim 29, wherein the probes are 
linked to the support via a spacer, 

46. The method of Claim 29, wherein the array 
further comprises a second, a third and a fourth probe sets 
each comprising a corresponding probe for each probe in the 
first probe set, the corresponding probes in the second, third 
and fourth probe sets being identical in sequence to the 
corresponding probe in the first probe set or a subsequence of 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets, and determining which probes, relative to one another, 
in the four probe sets specifically bind to the target nucleic 
acid or subsequence thereof, the relative specific binding of 
the corresponding probes in the four probe sets indicating 
whether a nucleotide in the target sequence is the same or 
different from the corresponding nucleotide in the reference 
sequence. 

47. The method of Claim 46, wherein the array 
further comprises a fifth probe set comprising a corresponding 
probe for each probe in the first probe set, the corresponding 
probe from the fifth probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of nucleotides thereof that includes the at 
least one interrogation position, except that the at least one 
interrogation position is deleted in the corresponding probe 
from the fifth probe set. 

48. The method of Claim 46, wherein the array 
further comprises a sixth probe set comprising a corresponding 
probe for each probe in the first probe set, the corresponding 
probe from the sixth probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of nucleotides thereof that includes the at 
least one interrogation position, except that an additional 
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8 nucleotide is inserted adjacent to the at least one 

9 interrogation position in the corresponding probe from the 
10 first probe set, 

1 49. The method of Claim 46, wherein the first probe 

2 set has at least three interrogation positions respectively 

3 corresponding to each of three contiguous nucleotides in a 

4 reference sequence. 

1 50. The method of Claim 46, wherein the first probe 

2 set has at least 50 interrogation positions respectively 

3 corresponding to each of 50 contiguous nucleotides in a 

4 reference sequence. 

1 51, The method of Claim 46, wherein the segment in 

2 each probe of the first probe set that is exactly 

3 complementary to the subsequence of the reference sequence is 

4 9-21 nucleotides. 

1 52. A method for identifying the genotype and/or 

2 phenotype of an organism by comparing a target nucleic acid 

3 sequence from a first organism coding for a gene (or its 

4 complement) to a reference sequence coding for the same gene 

5 (or its complement) from a second organism, said method 

6 comprising: 

7 (a) hybridizing a sample comprising the target 

8 ' nucleic acid or a subsequence thereof to an array of 

9 oligonucleotide probes immobilized on a solid support, the 

10 array comprising: 

11 a first probe set comprising a plurality of probes, 

12 each probe comprising a segment of nucleotides exactly 

13 complementary to a subsequence of the reference sequence, the 

14 segment including at least one interrogation position 

15 complementary to a corresponding nucleotide in the reference 

16 sequence, wherein each interrogation position corresponds to a 

17 nucleotide position in the reference or target sequence; 

18 (b) determining a hybridization intensity from each 

19 probe; 
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(c) plotting the hybridization intensities versus 
the nucleotide position corresponding to the probe from which 
the hybridization intensity was determined to derive a target 
plot of hybridization intensity; 

(d) repeating steps (a) - (c) with the target 
sequence replaced by the reference sequence, to derive a 
baseline plot of the reference sequence; and 

(e) comparing the target plot to the baseline plot 
to identify the genotype and/or phenotype of the organism. 

53. The method of Claim 52, wherein the second 
organism is Mycobacterium tuberculosis . 

54. The method of Claim 53, wherein the gene is 
selected from the group consisting of l6SrRNA, the rpoB gene, 
the katG gene, the inhA gene, the gyrA gene, the 23SnRNA gene, 
the rrs gene, the pncA gene, and the rpsL gene. 

55. The method of Claim 54, wherein the phenotype 
is resistance to an antibiotic drug. 

56. The method of Claim 55, wherein the drug is 
selected from the group consisting of rifampacin, rifabutin, 
isoniazid, streptomycin, pyrazinamide, ethambutol. 

57. The method of Claim 52, wherein the reference 
DNA or RNA sequences are from a highly conserved gene. 

58. The method of Claim 52, wherein the array 
further comprises a second, a third and a fourth probe sets 
each comprising a corresponding probe for each probe in the 
first probe set, the corresponding probes in the second, third 
and fourth probe sets being identical in sequence to the 
corresponding probe in the first probe set or a subsequence of 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
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sets, and the hybridization intensity determined in (b) is the 
maximum hybridization intensity from each of the corresponding 
probes in the four probe sets. 

59. An array of oligonucleotide probes immobilized 
on a solid support, the array comprising: 

a first probe set comprising a plurality of probes, 
each probe comprising a segment of nucleotides exactly 
complementary to a subsequence of a reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence; 

wherein the reference sequence is a gene from 
Mycobacterium tuberculosis. 

60. The array of Claim 59, wherein the gene is 
selected from the group consisting of 16SrRNA, the rpoB gene, 
the *atG gene, the inhA gene, the gyrA gene, the 23SnRNA gene, 
the rrs gene, the pncA gene, and the rpsL gene. 

61. The array of Claim 60, further comprising: 
a second, a third and a fourth probe sets each 

comprising a corresponding probe for each probe in the first 
probe set, the corresponding probes in the second, third and 
fourth probe sets being identical in sequence to the 
corresponding probe in the first probe set or a subsequence of 
' nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets, 

62. A method of identifying the presence of a 
nucleic acid polymorphism in a patient sample, comprising the 
steps of : 

(a) determining the difference between the 
hybridization intensities of a nucleic acid sequence from the 
patient sample and a corresponding nucleic acid sequence from 
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7 a wild type sample to an array of reference nucleic acid 

8 probes ; 

9 (b) deriving ratios of the difference in (a) to the 

10 hybridization intensity of the wild type sample for each base 

11 position corresponding to each reference nucleic acid probe; 

12 and 

13 (c) identifying the presence of a polymorphism at a 

14 base position corresponding to a reference probe if the ratio 

15 in (b) for the base position corresponding to the reference 

16 probe is greater than or equal to an assigned value. 

1 63. The method of claim 62, wherein the 

2 nucleic acid, sequence is selected from the group consisting of 

3 mitochondrial DNA, p53, MSH, MLH1, or BRCA-1. 

1 64. The method of claim 62, wherein the 

2 nucleic acid sequence comprises an HIV gene. 

1 65. The method of claim 62, wherein the 

2 nucleic acid sequence comprises a gene associated with a 

3 heritable disease. 

1 66. The method of claim 65, wherein the 

2 heritable disease is cystic fibrosis. 

1 67. A computer program product that identifies the 

2 presence of a nucleic acid polymorphism in a patient sample, 

3 comprising: 

4 computer code that determines the difference between 

5 the hybridization intensities of a nucleic acid sequence from 

6 the patient sample and a corresponding nucleic acid sequence 

7 from a wild type sample to an array of reference nucleic acid 

8 probes; 

9 computer code that derives ratios of the difference 

10 to the hybridization intensity of the wild type sample for 

11 each base position corresponding to each reference nucleic 

12 acid probe; 
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computer code that identifies the presence of a 
polymorphism at a base position corresponding to a reference 
probe if the ratio for the base position corresponding to the 
reference probe is greater than or equal to an assigned value; 
and 

a computer readable medium that stores the computer 

codes . 

68. In a computer system, a method of assigning an 
organism to a group, comprising the steps of: 

inputting groups of a plurality of known nucleic 
acid sequences, the plurality of known nucleic acid sequences 
being from known organisms; 

inputting hybridization patterns for the plurality 
of known nucleic acid sequences, each hybridization pattern 
indicating hybridization of subsequences of the known nucleic 
acid sequence to subsequences of a reference nucleic acid 
sequence; 

inputting a hybridization pattern for a sample 
nucleic acid sequence from the organism indicating, 
hybridization of subsequences of the sample nucleic acid 
sequence to subsequences of the reference nucleic acid 
sequence; 

comparing the hybridization pattern for the sample 
nucleic acid sequence to the hybridization patterns for the 
plurality of known nucleic acid sequences; and 

assigning a particular group to which the organism 
belongs according to the group of at least one of the known 
nucleic acid sequences that has a hybridization pattern that 
most closely matches the hybridization pattern of the sample 
nucleic acid sequence at specific locations. 

69, The method of claim 68, wherein the group is 
selected from the group consising of species, subspecies, 
genotype, and phenotype. 



70. The method of claim 68, wherein the group to 
which a sample nucleic acid sequence is assigned is determined 
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3 without requiring knowledge of the actual nucleotide sequence 

4 of the sample nucleic acid sequence, 

1 71. The method of claim 68, further comprising the 

2 step of normalizing hybridization intensities of the 

3 hybridization patterns of the sample and known nucleic, acid 

4 sequences using linear regression. 

1 72. The method of claim 71, wherein the comparing 

2 step include utilizing a regression coefficient from the 

3 linear regression for comparison. 

1 73. The method of claim 68, further comprising the 

2 step of generating a database of the hybridization patterns 

3 for the plurality of known nucleic acid sequences. 

1 74. The method of claim 68, wherein the reference 

2 nucleic acid sequence is from Mycobacterium tuberculosis. 

1 75. The method of claim 68, wherein the locations 

2 include locations of species-specific polymorphisms. 

1 76.' The method of claim 68, wherein the locations 

2 include locations of shared polymorphisms between or among 

3 multiple species. 

1 77. The method of claim 68, further comprising the 

2 step of calculating a probability that the sample nucleic acid 

3 sequence belongs to the particular group. 

1 78. The method of claim 68, wherein the group is a 

2 species of Mycobacterium. 

1 79. The method of claim 68, wherein the known and 

2 sample nucleic acid sequences include a highly conserved gene. 

1 80. The method of claim 68, wherein the known and 

2 sample nucleic acid sequences include a gene selected from the 
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group consisting of 16SrRNA, the rpoB gene, the katG gene, the 
inhA gene, the gyrA gene, the 23SnRNA gene, the rrs gene, the 
pncA gene, and the rpsL gene. 

81. A computer program product that assigns an 
organism to a group, comprising: 

computer code that receives as input groups of a 
plurality of known nucleic acid sequences, the plurality of 
known nucleic acid sequences being from known organisms; 

computer code that receives as input hybridization 
patterns for the plurality of known nucleic acid sequences, 
each hybridization pattern indicating hybridization of 
subsequences of the known nucleic acid sequence to 
subsequences of a reference nucleic acid sequence; 

computer code that receives as input a hybridization 
pattern for a sample nucleic acid sequence from the organism 
indicating hybridization of subsequences of the sample nucleic 
acid sequence to subsequences of the reference nucleic acid 
sequence; 

computer code that compares the hybridization 
pattern for the sample nucleic acid sequence to the 
hybridization patterns for the plurality of known nucleic acid 
sequences ; 

computer code that assigns a particular group to 
which the organism belongs according to the groups of at least 
one of the known nucleic acid sequences that has a 
' hybridization pattern that most closely matches the 
hybridization pattern of the sample nucleic acid sequence at 
specific locations; and 

a computer readable medium that stores the computer 

codes . 

82. In a computer system, a method of assigning 
groups to which organisms belong utilizing a generic probe 
array, comprising the steps of: 

inputting hybridization intensities for a plurality 
of isolates, the hybridization intensities indicating 
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6 hybridization affinity between the isolate and the generic 

7 probe array; 

8 selecting hybridization intensities that have the 

9 most variance across the plurality of isolates; and 

10 assigning each of the plurality of isolates to a 

-; group according to the selected hybridization intensities. 

83. The method of claim 82, wherein the group is 

2. selected from the group consising of species, subspecies, 

3 genotype, and phenotype. 

1 84. The method of claim 82, wherein the assigning 

2 step comprises the step of clustering the plurality of 

3 isolates into groups according to the selected hybridization 

4 intensities. 

1 85. The method of claim 84, wherein the clustering 

2 step is selected from the group consisting of Principal 

3 Components analysis and Variable Clustering analysis. 

1 86. The method of claim 82, further comprising the 

2 step of standardizing the hybridization intensities among the 

3 plurality of isolates. , 

87. The method of claim 86, wherein the 

i standardizing step comprises the step of adjusting the 

3 ' hybridization intensities of each isolate so that there is a 

4 common mean and variance across the plurality of isolates. 

1 88. The method of claim 82, wherein the generic 

2 probe array includes all nucleic acid probes of a specific 

3 length . 

1 89. A computer program product that assigns groups 

2 to which organisms belong utilizing a generic probe array, 

3 comprising the steps of: 

4 computer code that receives as input hybridization 
' 5 intensities for a plurality of isolates, the hybridization 
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6 intensities indicating hybridization affinity between the 

7 isolate and the generic probe array; 

"8 computer code that selects hybridization intensities 

9 that have the most variance across the plurality of isolates; 
10 computer code that assigns a group to each of the 

1-1 plurality of isolates according to the selected hybridization 

12 intensities ; and 

13 a computer readable medium that stores the computer 

14 codes. 
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ABSTRACT 

DNA amplification systems are powerful technologies 
with the potential to impact a wide range of diagnostic 
applications. In this study we explored the feasibility 
and limitations of a modified ligase chain reaction 
(Gap-LCR) in detection and discrimination of DNAs 
that differ by a single base. LCR is a DNA amplification 
technology based on the ligation of two pairs of 
synthetic oligonucleotides which hybridize at adjacent 
positions to complementary strands of a target DNA. 
Multiple rounds of denaturation, annealing and ligation 
with a thermostable ligase result in the exponential 
amplification of the target DNA. A modification of LCR, 
Gap-LCR was developed to reduce the background 
generated by target-independent, blunt-end ligation. In 
Gap-LCR, DNA polymerase fills in a gap between 
annealed probes which are subsequently joined by 
DNA ligase. We have designed synthetic DNA targets 
with single base pair differences and analyzed them in 
a system where three common probes plus an allele- 
specific probe were used. A single base mismatch 
either at the ultimate 3' end or penultimate 3' end of the 
allele specific probe was sufficient for discrimination, 
though better discrimination was obtained with a 
mismatch at the penultimate 3' position. Comparison 
of Gap-LCR to allele-specif ic PCR (ASPCR) suggested 
that Gap-LCR has the advantage of having the additive 
effect of polymerase and ligase on specificity. As a 
model system, Gap-LCR was tested on a mutation in 
the reverse transcriptase gene of HIV, specifically, one 
of the mutations that confers AZT resistance. Mutant 
DNA could be detected and discriminated in the 
presence of up to 10 000-fold excess of wild-type DNA. 

INTRODUCTION 

The ability to detect single base changes is of great importance in 
molecular genetics. Specific identification of point mutations in 
the human genome plays a major role in diagnosis of hereditary 
diseases and in identification of mutations within oncogenes, 
tumor supressor genes and of mutations associated with drug 
resistance. 



Single base variations have been analyzed by a variety of 
techniques, such as restriction fragment length polymorphism 
(1), denaturing gradient gel electrophoresis (2) and chemical 
cleavage of mismatched heteroduplexes (3). Other techniques 
include RNAse cleavage of mismatched bases (4) and single 
strand conformation polymorphism (5). All of these techniques 
have the advantage of being able to screen for unknown 
mutations. Yet, they are very labor intensive, muitistep, non-auto- 
mated processes and most importantly lack sensitivity (6). 
Recently, highly sensitive amplification-based techniques have 
been developed, among which are hybridization of aliele-specific 
oligonucleotides to polymerase chain reaction (PCR)-amplified 
products (7,8) and competitive oligonucleotide priming, where 
differential amplification depends on differential hybridization 
(9). The amplification refractory mutation system (10), also 
referred to as aliele-specific PCR (ASPCR) (1 1), which relies on 
positioning the mutation at the 3' end of a PCR primer, and the 
ligase chain reaction (LCR), where a mismatch is positioned at 
the ligation joint (12-14), are two other amplification technol- 
ogies used for analysis of single base mutations. 

In the LCR, two pairs of synthetic oligonucleotides which 
hybridize at adjacent positions to complementary strands of a 
target DNA are joined by a thermostable ligase. Multiple rounds 
of denaturation, annealing and ligation result in the exponential 
amplification of the target DNA (Fig. 1A) (13-18). Targets that 
differ by a single base pair are discriminated, since a mismatch at 
the ligation joint severely reduces the efficiency of ligation 
(12-14,19). Generation of target-independent ligation products 
due to blunt-end ligation poses limitations on the sensitivity of 
LCR (20). Typically, the sensitivity of LCR or any diagnostic 
assay is not a critical factor for detection of mutations in human 
genetic diseases, where 50 or 100% of DNA contains the 
mutation. In contrast, for detection of somatic mutations within 
oncogenes, tumor supressor genes or drug resistance mutations, 
where a small number of mutated molecules need to be detected 
in the presence of excess wild-type DNA, sensitivity becomes a 
critical factor. 

Several approaches have been taken to increase the sensitivity 
of LCR. One approach has been to use another amplification 
technology, such as PCR, followed by limited amplification with 
LCR (21,22). Other alternatives are PCR followed with the 
ligation detection reaction (LDR), where only two adjacent 
probes are used, resulting in linear amplification (13,14,20,21), 
or PCR followed with the oligonucleotide ligation assay (OLA), 
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Figure 1. Diagrammatic representation of LCR and Gap-LCR. The comple- 
mentary strands of target DNA are represented as shaded bars, LCR probes as 
solid bars and regions extended by DNA polymerase as white bars. (A) In LCR, 
four probes covering the entire target sequence anneal to complementary 
strands, probe 1 is ligated to probe 3 and probe 2 to probe 4 by a thermostable 
DNA ligase. The ligated probes function as targets in subsequent cycles and 
exponential amplification is achieved. (B) In Gap-LCR, probes 1 and 4 have 3' 
overhangs with respect to their complements. Probes I and 4 are extended by 
DNA polymerase with the appropriate nucleotide(s) to fill the gap and ligated 
to probes 3 and 2 respectively with the ligase. 



where ligation of two adjacent probes is used as a single detection 
step (12, 19,23). However these combined approaches necessitate 
the opening of tubes after PCR, generating a source of contamina- 
tion and also introducing complexity to automation. 

A modification of LCR, Gap-LCR has been introduced to 
circumvent these difficulties and improve the sensitivity of LCR 
(24-27). In Gap-LCR, complementary probe pairs containing $ 
extensions are used. After hybridization to target DNA, a gap of 
one to several bases exists between adjacent probes. A thermo- 
stable DNA polymerase, devoid of 3'-*5' exonuclease activity, 
and the appropriate nucleotide(s) are used to fill the gap and the 
resultant probes are joined by DNA ligase (Fig. IB). The use of 
probe duplexes with non-complementary 3' extensions prevents 
the generation of target-independent ligation products. Gap-LCR 
has been successfully applied to detect < 10 target molecules in 
a reaction (unpublished results). Amplification products are 
detected by a sandwich immunoassay performed with an 



automated analyzer (17,26). The sensitivity, specificity and 
automation of the technology make Gap-LCR a good candidate 
for diagnostic tests. 

In this study we explored the properties of Gap-LCR in the 
detection and discrimination of target DNA sequences that differ 
by a single base pair. 

MATERIALS AND METHODS 
Oligonucleotides and plasmids 

Target DNAs (50 nt) were synthesized and gel purified by 
Genosys (The Woodlands, TX). The sequence of the wild-type 
target was derived from the sequence of the Chlamydia tracho- 
matis cryptic plasmid, map position 2230-2280 (28). Mutant A 
and Mutant B targets were identical to the wild- type target, except 
single base changes were introduced at the indicated positions to 
both strands during synthesis (Fig. 2A). Targets for HIV 
experiments (Fig. 5) were gifts from Dr Steve Wolinsky 
(Northwestern University) and Dr John Mellors (University of 
Pittsburgh). They were provided as purified DNA from plasmids 
containing a 1.7 kb fragment of the HIV genome cloned into 
EcoRl and HindUl sites of the vector pKK233 (Pharmacia). The 
mutant sequence.has a mutation at amino acid 215 which changes 
the codon from ACC to TAC (from threonine to tyrosine) (Fig. 
5A). The sequence of the region used as the target for LCR is 
5'-AACATCTGTTGAGGTGGGGATTTACCACACACCAGA- 
CAAAAAAC ATC AGA. The LCR probe sets were synthesized 
on an Applied Biosystems synthesizer 394 by the phosphorami- 
dite method. The 5' end of probe 1 and 3' end of probe 2 were 
covalently linked to carbazole, while the 3' end of probe 3 and 5' 
end of probe 4 were linked to adamantane (Fig. 2A). In the 
experiments where both PCR and LCR were performed (Fig. 4), 
only probes 1 and 4 were haptenated. Probes were purified on a 
12% denaturing poly aery lamide gel (29). Quantitation was by 
absorbance at 260 nm. 



LCR and PCR amplification 

LCR and PCR reactions contained 500 ng human placental DNA 
with either no target DNA (negative control) or with 100 
molecules of target DNA unless stated otherwise. LCR reactions 
were run in a buffer containing 50 mM EPPS, pH 7.8, 30 mM 
MgCl 2 , 20 mM K + , 10 jiM NAD, 1-10 pM gap filling 
nucleotides, 30 nM each oligonucleotide probe, 1 U Thermus 
flavus DNA polymerase, lacking 3'-f5' exonuclease activity 
(MBR, Milwaukee, WI), and 5000 U T.thermophilus DNA ligase 
(Abbott Laboratories; 1 U is the amount of DNA ligase producing 
1 nM ligated product in 10 min at 55°C at pH 7.8). Reaction 
volume was 50 \i\ and each reaction was overlaid with 50 \i\ 
mineral oil prior to cycling in a Perkin Elmer 480 thermocycler. 
Cycling conditions consisted of a 30 s incubation at 85°C and a 
30 s incubation at 60°C. Cycle numbers are indicated in figure 
legends. PCR reactions were run under the same conditions as 
LCR, except all four dNTPs were used and probes 2 and 3 and 
ligase were omitted. For the amplification of HIV sequences, 
concentrations of LCR reagents were as described above except 
0.5 U DNA polymerase was used. The reaction cycling condi- 
tions consisted of a 3 min denaturation at 94°C followed by 38 
cycles of 1 s at 94°C I s at 5S°C and 30 s at 64°C 
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Figure 2. Design and specificity of Gap-LCR probe sets for the amplification of DNAs that differ by a single base pair. Double-stranded synthetic target DNAs, 
designated wild-type (WT), mutant A or mutant B are shown. Only nucleotides of interest are shown; the remainder of the sequences represented by black bars are 
identical in all targets. The changed nucleotides in the mutant A and mutant B targets with respect to the wild-type target are highlighted. Gap-LCR probes are 
represented as gray bars. Carbazole is represented as squares and adamantane as circles. (A) Gap-LCR probes for the specific amplification of the wild-type target 
Probe 4 is complementary to the wild-type target and has a single mismatch (X) with mutant A and mutant B targets. 0) Gap-LCR probes for the specific amplification 
of the mutant A target Probes 1 , 2 and 3 are the same as in (A), probe 4a is different from probe 4wt and the change is highlighted. P4a is complementary to the mutant 
A target and has a mismatch (X) with the wild-type target (Q Gap-LCR probes for specific amplification of the mutant B target The change in probe 4b is highlighted 
P4b is complementary to the mutant B target and has a mismatch (X) with the wild-type target <p) Specificity of the Gap-LCR probes, the wild-type-, mutant A- and 
mutant B-specific probe sets shown in (A-C) respectively were tested with either human placental DNA (H.P.), wild type (WT) target mutant A or mutant B targets. 
Reaction conditions are described in Materials and Methods. Samples were cycled for 25 cycles and products were detected using the Abbott IM^> automated 
immunoassay as counts/second/second (c/s/s) as described (17). 



Detection of amplified products 

Amplification products were detected via a sandwich immunoas- 
say performed using the Abbott IMx® automated analyzer. 
Amplification products were captured using anti-carbazole 
coated microparticles. After a washing step, the captured 
products were detected using an anti-adamantane-alkaline phos- 
phatase conjugate which, in the presence of methylumbelliferone 
phosphate, generates a fluorescent product at a rate proportional 
to the amount of captured product. The average IMx® rate from 
duplicate samples was taken and standard deviations are shown. 

For the detection of LCR products on polyacrylamide gel (Fig. 
3B), unhaptenated probe 1 was phosphorylated at the 5 end using 
the Gibco BRL 5' DNA terminus labeling system and 50 \iQ\ 
[Y- 32 P]ATP (Amersham). LCR reactions were set up as described 
above, except that equal amounts of radiolabeled and cold probe 
1 were used (15 nM of each per reaction), and samples were 
cycled for 43 cycles. For restriction analysis, 15 jxl of the 
amplified product was incubated with 1 .5 ul Haelll (10 U/|il) and 
1 .8 ul 10 x buffer (Promega) for 1 h at 37°C; the controls (- lanes) 
were also incubated with 10 x buffer at 37 °C in the absence of 



Haelll. Products were separated by electrophoresis on 12% 
denaturing polyacrylamide gels (29). 

RESULTS 

Design of targets and Gap-LCR probe sets 

Synthetic double-stranded DNA targets, designated wild-type or 
mutant, that differed by a single base pair were designed as shown 
in Figure 2A. Gap-LCR probe sets specific for the amplification 
of each target DNA were synthesized. The probes were staggered, 
i.e. probes 1 and 4 had 3' overhangs when hybridized to their 
complements. When annealed to the target DNA, probes 1 and 4 
were extended by DNA jx>Iymerase, in the presence of appropri- 
ate nucleotide^). The extended probes were then.ligated to 
probes 3 and 2 respectively. The ligated products could function 
as targets in subsequent cycles, thus allowing exponential 
amplification. Probes 1, 2 and 3 were common and probe 4 was 
specific for each target; probes 4wt, 4a and 4b were designed to 
specifically amplify wild-type, mutant A and mutant B targets 
respectively and had a single mismatch with the non-analogous 
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Figure 3. (A) The effect of cycle number on specificity of Gap-LCR. the 
wild-type-specific probe set was used with human placental (H.P.), wild-type 
(WT), mutant A and mutant B DNAs as shown in figur? 2A. Reaction 
conditions are as described in Materials and Methods and cycle numbers are 
indicated (B) Analysis of 'overamplified' LCR products, the wild-type-spe- 
cific probe set was tested with wild-type, mutant B and mutant A targets as 
shown in Figure 2A. the mutant B-specific probe set was tested with wild-type 
and mutant B target as shown in Figure 2C. Probe 1 was radiolabeled wiuV^P 
and reactions were cycled for 43 cycles. Amplified products were divided in 
two; one set was restricted with//aeIH (+ lanes), the other set was not (- lanes). 
Products were electrophoresed on a 12% denaturing gel and detected by 
autoradiography. 



targets (Fig. 2A-C). The mutation was positioned so that on one 
strand it was complementary to one of the bases to be filled during 
the extension of probe 1 and on the other strand it was mismatched 
with respect to probe 4. To assess the effect of the mismatch 
position on specificity, the position of the mutation was varied to 
generate a C:C mismatch either at the ultimate 3' end or 
penultimate 3' end of probes 4a and 4b with respect to the 
wild-type target (Fig. 2B and C). Similarly the wild-type-specific 
probe, 4wt, had a G:G mismatch either at the ultimate 3' end or 
penultimate 3' end with mutant A and mutant B targets 
respectively (Fig. 2A). 

Specificity of Gap-LCR probe sets 

The specificity of the probe sets is shown in Figure 2D. Mutant 
A, mutant B, wild-type or human placental DNA (negative 
control) were amplified with the different probe sets. With human 
placental DNA, amplified product was not observed, indicating 
that target- independent non-specific amplification was not 
significant. Mutant probe sets amplified only their respective 



mutant targets, whereas the wild-type probe set amplified only the 
wild-type target and not the mutant targets. These results 
demonstrate that a single base mismatch positioned either at the 
ultimate 3' or penultimate 3' end of probe 4 is sufficient to provide 
discriminative amplification by Gap-LCR under the conditions 
used in this study. 

As has been shown for ASPCR and LCR (30), discrimination 
by Gap-LCR can be adversely affected by increasing the number 
of amplification cycles. To determine the maximum number of 
cycles where the amplification remains specific, wild-type, 
mutant A, mutant B and human placental DNAs were amplified 
with the wild-type-specific probe set in the presence of dCTP for 
20, 25, 30, 35 or 40 cycles (Fig. 3A). Amplified product was 
detected after 20 cycles with wild-type target, after 30 cycles with 
mutant A target and after 35 cycles with mutant B target. No 
product was detected with human placental DNA even after 40 
cycles. This result suggests that there is a window of about 10 
cycles where the amplification is most specific. Similar results 
were observed when mutant-specific probes were used with each 
target (data not shown). 

When the wild-type probe set was used with mutant targets 
(Fig. 2 A), identical specificity was seen when dGTP was omitted 
or added to the reaction, suggesting that omission of the 
nucleotide to fill the base complementary to the mutation does not 
significantly contribute to specificity (data not shown). This 
result was expected, since extension of probe 1 with dCTP and 
dGTP and ligation to probe 3 would not generate a perfect 
substrate for probes 2 and 4; probe 4 would still be mismatched 
with the ligated substrate and be refractory to amplification (Fig. 
2A). In contrast, extension from the mismatched probe 4 and 
ligation to probe 2 would generate a ligated product that would be 
a perfect substrate for probes 1 and 3, in which case dGTP would 
not be needed and wild-type product would be generated (Fig. 
2 A). This prediction was confirmed experimentally by analyzing 
the products that were generated after over-amplification, wild- 
type and mutant targets were amplified with the wild-type probe 
set for 43 cycles (where products from mismatched targets are 
generated) in the presence of both dGTP and dCTP and the nature 
of the amplified products was analyzed (Fig. 3B). Amplified 
products were digested with the restriction enzyme Haelll, which 
cleaves at the GGCC site which would be present only on 
wild-type products (Fig. 2A). For this experiment, the 5' end of 
probe 1 was radiolabeled and the products were detected on a 
denaturing poly aery lamide gel. The results demonstrate that the 
products generated from both matched and mismatched targets 
were cleaved by Haelll, thus wild-type product was generated in 
all cases (Fig. 3B). In contrast, products amplified with the mutant 
B -specific probe set were not cleaved by Haelll. Products 
generated with the mutant A-specific probe set were not cleaved 
by Haelll either (data not shown). These results confirm the 
prediction that dGTP is not utilized in the generation of products 
when the wild-type probe set is used with mutant targets. 
Therefore omission of dGTP does not significantly contribute to 
the specificity. 

We explored the specificity of Gap-LCR with increasing 
number of target molecules to determine the maximum number 
of mismatched target molecules where the amplification remains 
specific, the wild-type probe set was tested with increasing 
concentrations of matched (wild-type) or mismatched (mutant) 
targets (Fig. 4A). The results indicate that while 10 molecules of 
the matched target were detected, using optimal cycle numbers. 
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Figure 4. Comparison of Gap-LCR to ASPCR. (A) The wild-type-specific Gap-LCR probe set was tested with wild-type, mutant A and mutant B targets as shown 
in Figure 2A t except only probes 1 and 4 were haptenated. Target concentrations are as shown. @) PCR reactions were as described in Materials and Methods; reactions 
were cycled for 23 cycles. The primers used for PCR are shown in the lower panel. They are identical to probes 1 and 4 used in Gap-LCR (fig. 2A). The primers are 
specific for amplification of the wild-type target Primer 4 has a mismatch at the ultimate 3" end with the mutant A target and at its penultimate 3 end with the mutant 
B target (Xs). 



detection of the mismatched targets occured only with 10M0 5 
molecules. The loss of discrimination with mutant A target (10* 
molecules) preceded the loss of discrimination with mutant B 
target (10 5 molecules). Similar results were seen when the cycle 
number was increased beyond the optimum (Fig. 3A); product 
was detected after 30 cycles with mutant A target and 35 cycles 
with mutant B target. Both mutant targets amplified at the same 
rate with their respective matched probes and the differential rate 
of amplification of mutant targets was only observed with 
mismatched wild-type probes. These results suggest that a 
mismatch positioned at the penultimate 3' end is discriminated 
better than a mismatch at the ultimate 3' end. 

In Gap-LCR, discrimination between targets that differ by a 
single base may rely on three steps: (i) hybridization of 
mismatched probes; (ii) fidelity of the polymerase to extend from 
mismatches; (iii). specificity of the ligase to join probes extended 
from mismatches. ASPCR also requires the first two steps, yet 



Gap-LCR may have the additional level of specificity required by 
the necessity for proper ligation. To address this question, 
specificity of Gap-LCR and ASPCR were compared under the 
same reaction conditions (Fig. 4). ASPCR experiments were 
performed with the same targets using only two of the haptenated 
probes (probes 1 and 4), all four nucleotides and same reaction 
conditions utilized for Gap-LCR. For this comparative study, 
only probes 1 and 4 were linked to haptens for Gap-LCR. 
Detection of products relied on complementarity of the strands 
linked to the two haptens. Results indicate that with ASPCR, 
better discrimination was observed when the mismatch was at the 
ultimate 3' end than at the penultimate 3' end. This is in contrast 
to the observation made using Gap-LCR (Figs 3A and 4A). 
Comparing the two amplification procedures, mismatched targets 
were amplified at a faster rate in ASPCR than Gap-LCR, while 
the amplification rate of the matched target was equivalent in both 
reactions. This difference was enhanced when the mismatch was 
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Figure 5. Gap-LCR for the detection of an AZT resistance mutation. CA) 
Design of the Gap-LCR probe set for specific amplification of the mutant DNA. 
The wild-type and mutant target DNAs comprising 50 bases of the HIV reverse 
transcriptase gene are shown as solid bars. Codon 215 is underlined. The 
mutations in codon 2 1 5 are highlighted. Probes are represented as gray bars and 
haptenated as described in Figure 2. Probe 4mut is specific for amplification 
of the mutant DNA and has a 3' terminal mismatch (X) with the wild-type 
target (B) The mutant-specific probe set was tested with increasing concentra- 
tions of wild-type or mutant targets as shown in (A). Reaction conditions are as 
described in Materials and Methods. (C) The mutant-specific probe set was 
tested with 50 molecules of mutant target mixed with increasing concentrations 
of the wild-type target (shaded bars) or with increasing concentrations of the 
wild-type target alone (open bars). The ratio of the wild-type target to the mutant 
target is indicated above the shaded bars. Reaction conditions arc as described 
in Materials and Methods. 



at the penultimate 3' end. This observation suggests that the 
specificity of Gap-LCR does not solely rely on hybridization and 
the extension of the mismatched probe by the polymerase as in 
ASPCR and that Vhe ligation step adds to the specificity of 
Gap-LCR. 

Gap-LCR for detection of a HIV AZT resistance 
mutation 

To show the feasibility and specificity of Gap-LCR for detection 
and discrimination of a natural mutation, a mutation at codon 2 15 
of the HIV reverse transcriptase gene was tested as a model 
system. Mutation at codon 215 from ACC (threonine) to TAG 
(tyrosine) has been associated with resistance to AZT 
(3'-azido-3'-deoxythymidine) (31). Probe sets specific for the 
amplification of the mutant viral DNA were designed and tested 
on cloned HIV DNA carrying wild-type or mutated sequences at 
codon 215 (Fig. 5 A). Even though the wild-type and mutant 
targets differ by two bases, one of them (the first base of the 
codon) is positioned in the overlapping gap and does not create 
a mismatch with the probes. Because the necessary nucleotides 
are provided in the reaction (ATP and TTP), the change in that 
position does not contribute to discrimination. Thus, the discri- 
mination relies on a single base change; the mutant-specific probe 
set is designed to have a single mismatch at the ultimate 3' end of 
probe 4 with the wild-type DNA. The specificity of the 
mutant-specific probe set was tested with increasing concentra- 
tions of wild-type and mutant targets. While 10 molecules of 
mutant target were detected, no product was observed with up to 
10 000 molecules of wild-type target (Fig. 5B). No product was 
detected with human placental DNA (used as a negative control). 
To determine whether the mutant target could be detected in the 
presence of excess wild- type target, 50 molecules of mutant target 
were mixed with increasing concentrations of wild-type target (5 
x 10 2 to 5 x 10 5 ). Increasing concentrations of wild-type target in 
the absence of mutant target was tested as the control. Results 
indicate that specific amplification of the mutant target occurs in 
the presence of up to Kr-fold excess wild-type target (Fig. 5C). 
Slight cross-reactivity with the wild-type target was observed 
when 5 x 10 5 molecules of wild-type target were used. 

DISCUSSION 

A reliable DNA diagnostic method requires accurate discrimina- 
tion, low background and automation. In this study we have 
shown that Gap-LCR meets these requirements; DNA targets that 
differ by a single base pair were discriminated, the background 
was low, sensitivity was high and the products of the reaction 
were detected by an automated immunoassay. In the experiments 
designed in this study, discrimination between related targets 
relied on a single base mismatch between one of the Gap-LCR 
probes and the target DNA. The extent of the discrimination 
depended on the cycle number, concentration of the mismatched 
target and the position of the mismatch. 

As with any other amplification reaction, the reaction specific- 
ity was expected to deteriorate with increasing cycle number (30). 
When mismatched probes are extended and ligated during any 
cycle, the newly formed molecules are able to function as 
templates in subsequent cycles. The products generated from the 
matched target will reach a plateau after a certain cycle number, 
while the products generated from the mismatched target will 
continue to exponentially amplify until they reach the levels seen 



with the matched target, at which point discrimination will, be 
completely lost. A rough calculation of the accumulation of 
products from matched and mismatched targets has been reported 
previously for ASPCR by Ugozolli and Wallace (30). We show 
that for Gap-LCR accumulation of products from mismatched 
targets occurs about 10-15 cycles later than the detection of 
products from the matched target. Moreover, even after 20 
additional cycles (40 cycles total), no signal was observed with 
negative control placental DNA. 

Template concentration also plays a major role in specificity. 
Similarly to an increased number of cycles, the specificity of 
Gap-LCR was expected to deteriorate when large amounts of 
mismatched target were used. We showed that under the 
conditions used in this study, as few as 10 molecules of the 
matched target were detected, while equivalent detection of the 
synthetic mismatched target required 10 000-100 000 molecules, 
depending on the position of the mismatch. With the HIV-specific 
probe set, the specificity was even better. Product was detected 
with 10 molecules of the matched target, while only a small 
amount of product was detected with 500 000 molecules of the 
mismatched target. The better specificity observed with the HIV 
probe set may be attributed to differences in sequence, in reaction 
conditions and/or the nature of the targets used in these studies (50 
bp linear double-stranded synthetic targets versus 8 kb circular 
plasmid DNA). Our results indicate that with Gap-LCR, good 
specificity can be obtained under conditions where exquisite 
sensitivity is maintained. 

Our studies demonstrate that a single mismatch between one of 
the Gap-LCR probes and the target is sufficient for discrimination 
of single base substitutions. The specificity seems to rely solely 
on the efficiency of extension and ligation of the mismatched 
probe. Omission of the nucleotide complementary to the mutated 
base in the fill does not significantly add to the specificity. Once 
the mismatched probe is extended and ligated, it generates a target 
for the complementary probes, which can extend and ligate in the 
absence of the omitted nucleotide. After such an event, amplifica- 
tion is exponential in the following cycles. The omission of the 
nucleotide complementary to the mutation would effectively 
prevent amplification if the mutation was positioned in an 
overlapping gap, where probes need to be designed not to cover 
the mutated base in either strand. However, such a scheme has 
limited application. It can only be used if the mutation is an A or 
T change to a C or G or vice versa. Other changes would 
necessitate the same fill in the overlapping gap. 

In several previous reports where single base mismatches were 
not refractory to amplification, further deliberate mismatches 
were introduced to achieve discrimination with ASPCR or 
blunt-end LCR (10,11,30,32-35). For ASPCR, Newton et al 
(10) reported that the primers became increasingly refractory to 
amplification as the additional mismatch was moved progressive- 
ly closer to the 3' end of the PCR primer. Under the conditions 
used in our study, a second mismatch was not necessary and in 
fact positioning a second mismatch next to the terminal mismatch 
would likely result in a failure to amplify either target, since we 
demonstrated that a single base mismatch one base from the y 
end was inhibitory to amplification. We have also observed that 
a mismatch two bases from the 3' end was refractory to 
amplification with Gap-LCR (data not shown). Reaction condi- 
tions may be optimized to accommodate additional mismatches. 
Whether such an approach would increase the specificity of 
Gap-LCR remains to be explored. 



Nucleic Acids Research, 1995, Vol. 23. No. 4 681 

It was previously reported that the nature of the mismatch 
affects both the polymerase extension and ligation efficiencies 
(10,14,32,33). However, in ASPCR conflicting results were 
obtained for the same mismatches in different reports, presuma- 
bly due to differences in primer length, surrounding sequences 
and reaction conditions (30). Although our studies were not 
designed to compare the effect of base pair composition on the 
specificity of Gap-LCR, we observed that G:G, C:C and C:T 
mismatches were all refractory to amplification with Gap-LCR. 
Nevertheless, to assess the effect of mismatch position on 
Gap-LCR specificity, we chose to limit our comparison to the 
same mismatch (G:G), positioned at two different locations at the 
ultimate or penultimate 3' end of probe 4wt. We found that better 
specificity was obtained when the mismatch was at the penulti- 
mate 3' end. Previous studies on the effect of mismatch 
positioning on the specificity of polymerases or ligases are 
limited in scope. The effect of mismatches for Taq polymerase 
and T4 ligase have been shown to be greatest at the ultimate 3' 
position (32,36). Our PCR results with T.flavus polymerase are in 
agreement with these observations. Yet in Gap-LCR we observed 
better specificity with the 3' penultimate mismatch. This differ- 
ence may be due to the nature of the ligase used in our studies 
and/or to the combinatory effect of ligase and polymerase as it is 
in Gap-LCR. Direct comparison of ligase specificity in Gap-LCR 
to ligase specificity in the absence of polymerase in blunt-LCR is 
not feasible. Generation of target-independent ligation products 
is very common in blunt-LCR and would preceed detection of 
products from mismatched targets. Moreover, the position of the 
mismatch in blunt-LCR probe and Gap-LCR probe cannot be 
directly compared ; in Gap-LCR, depending on the size of the gap, 
a mismatch at the 3' end of the probe becomes a mismatch 2-4 
bases away from the ligation junction after polymerase extension. 

To determine whether discrimination obtained with Gap-LCR 
was solely due to the specificity of the polymerase or to the 
additive specificity of polymerase and ligase, we compared 
discrimination obtained with polymerase alone (in ASPCR) to 
discrimination obtained with polymerase plus ligase (in Gap- 
LCR) under the same reaction conditions. Our data indicate that 
the specificity of Gap-LCR depends on the fidelity of polymerase 
extension as well as on the specificity of ligation. The difference 
between Gap-LCR and ASPCR was further enhanced when the 
mismatch was positioned at the penultimate $ base, since at that 
position the specificity increased for Gap-LCR but decreased for 
ASPCR when compared to the mismatch at the 3' end. Specificity 
of PCR could have been improved by optimizing conditions. 
However, these experiments were not aimed at comparing the 
performance of Gap-LCR versus ASPCR; they were designed to 
determine whether the specificity of Gap-LCR relied on polymer- 
ase alone or on polymerase plus ligase, thus the same reaction 
conditions needed to be utilized. 

The potential of this amplification method to detect a mutated 
DNA sequence present at low copy number in a high background 
of wild-type DNA was evaluated. We observed mat detection of 
mutant DNA in the presence of up to 10^-fold excess wild- type 
DNA was feasible. This result demonstrates the advantage of 
Gap-LCR over blunt-LCR. With blunt-LCR, the signal obtained 
from mutant DNA in the presence of 100-fold excess wild-type 
DNA could not be distinguished from the background noise (37). 
Our results suggest that Gap-LCR would allow detection of 
mutations present at frequencies as low as one in 1 0* gene copies: 



682 Nucleic Acids Research, 1995, Vol 23, No. 4 



thus Gap-LCR can be used in conditions where only a small 
fraction of cells are expected to contain the mutation. 

In conclusion, we have demonstrated that Gap-LCR is a 
sensitive and specific amplification technique that can accurately 
discriminate single base changes. A significant advantage of the 
Gap-LCR assay described here is the ability to specifically detect 
the reaction products using a simple automated immunoassay 
system. Our study suggests that Gap-LCR can be used as a 
powerful tool in the diagnosis of genetic diseases, in monitoring 
drug resistant pathogens and in the detection of oncogenic 
mutations. 
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Automated partial DNA sequencing was conducted on 
more than 600 randomly selected human brain comple- 
mentary DNA (cDNA) clones to generate expressed se- 
quence tags (ESTs). ESTs have applications in the discov- 
ery of new human genes, mapping of the human genome, 
and identification of coding regions in genomic se- 
quences. Of the sequences generated, 337 represent new 
genes, including 48 with significant similarity to genes 
from other organisms, such as a yeast RNA polymerase IT 
subunit; Drosaphila kinesin, Notch, and Enhancer ofsplit\ 
and a murine tyrosine kinase receptor. Forty-sbc ESTs 
were mapped to chromosomes after amplification by the 
polymerase chain reaction. This fast approach to cDNA 
characterization will facilitate the tag tring of most human 
genes in a few years at a fraction of the cost of complete 
genomic sequencing, provide new genetic markers, and 
serve as a resource in diverse biological research fields. 



THE HUMAN GENOME IS ESTIMATED TO CONSIST OF 50,000 
to 100,000 genes, up to 30,000 of which may be expressed 
in the brain (J). However, GenBank lists the sequence of 
only a few thousand human genes and <200 human brain messen- 
ger RNAs (mRNAs) (2). Once dedicated human chromosome 
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sequencing begins in 5 years, it is expected that 12 to 15 years wili 
be required to complete the sequence of the genome (J). It is 
therefore likely that the majority of human genes will remain 
unknown for at least the next decade. The merits of sequencing 
cDNA, reverse transcribed from mRJMA, as a parr of the human 
genome project Have been vigorously debated since the idea of 
dctcimining the complete nucleotide sequence of humans first 
surfaced. Proponents of cDNA sequencing have argued that because 
the coding sequences of genes represent the vast majority of the 
information content of the genome, but only 3% of the DNA, 
cDNA sequencing should take precedence over genomic sequencing 
(4). Proponents of genomic sequencing have argued the difficulty of 
finding every mRNA expressed in all tissues, ceil types, and devel- 
opmental stages and have pointed out that much valuable informa- 
tion from intronic and intergenic regions, including control and 
regulatory sequences, will be missed by cDNA sequencing (5). 
However, many genome enthusiasts have incorrectly stated that 
gene coding regions, and therefore mRNA sequences, arc readily 
predictable from genomic sequences and have concluded that there 
is no need for large-scale cDNA sequencing. In fact, prediction of 
cranscribed regions of human genomic sequence is currendy feasible 
only for relatively large exons (6*). 

On the basis of our high output with automated DNA sequence 
analysis of 96 templates per day and consideration of the above 
issues, we initiated a pilot project to test the use of partial cDNA 
sequences (ESTs) in a comprehensive survey of expressed genes. 

Sequence- tagged sites (STSs) arc becoming standard markers for 
the physical mapping of the human genome (7). These short 
sequences from physically mapped clones represent uniquely iden- 
tified map positions. ESTs can serve the same purpose as the random 
genomic DNA STSs and provide the additional feature of pointing 
directly to an expressed gene. An EST is simply a segment of a 
sequence from a cDNA clone that corresponds to an mRNA ESTs 
longer than 150 bp were found to be the most useful for similarity 
searches and mapping. 
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. Libraries of Complementary DNAs 

Of the estimated 30,000 genes expressed in the hum- ' u • 
many „ 20,000 may encode low-abuncW *' hvmtnb ™> « 

• neurological function, is P M indication ofthl £ tf'd" 

.: . tance of genes expressed in the brain (*). ^ ^ ^ ™P°'- 

An assumption in our choice of cDMa i;k« • . " " • 

ptotinJpuM cDNA clones 3 d tTorrm?o Urand0m - 
^nofymg genes ^ conitrucdn a jj^™/' « 

sequencing from the ends of full-length cVN m Zu u ^ 
and 3' untranslated sequences) wJSd ^-"""J. 5 ' 

sequences, we hoped to" take advantage of more ^ 
comparisons, in addition to nucieoridf seouTnce conT ^ « 
discover the inherent limitation* to be oT™™ P T"** T ° 
cDNA sequencing project, we wa^ed £0 C S ^''^r 
representative cDNA libraries, identify d<SST j !. " lty ° f 
charaaeristics of the libraries a „H ? • ^ ^"'"ble 
content and accural ofsSrunlo,!""™^ 1116 info ™" 

single ftim j 0 „ r!tk £ -dviL J» ,^ , 

■fan. Prions from s „ ^' °" ' <«W 

fibroblast ccU line cDNA library and rcrn^^H! c ^ 
quenecs (Tabic 1) (12, 13). removing the common se- 



EST Characterization ''" . " ' - : : 

WtiaAy, EST sequences were examined for simiiarid* b ^ 
GcnBanJc nucic, add datable (14). ESTs withouc 

Number, GenBank accc£ on nnmM^ t0 chc human 8«e. 

*>ns arc' from £J °w h ^ ?~ Ma ? P*- 

EST name in GcaBanJc are *e £ LD fc J°^«*Ky ^protein; 
"ESTOO." three-digit number given here preceded by 



001 
002 
003 
264 
265 
005 
005 
237 
238 
239 
240 
242 
243 
004 
244 
245 
246 
251 
253 
254 
261 
262 
263 
266 
263 
280 
269 
278 
284 
285 
288 
362 
363 
366 
367 



ft-Actin (cytoplasmic) 
fi-Actln (cytoplasmic) 
fl-Actia (cytoplasmic) 
T~Actin (nonauscle) 
T*Ac;in (aotuauacle) 
OTPaa« 

OT? 33 0 

AOP/atp translocate 

Fructose-l, 6-bisphosphatase 

a-2-M-acroglabulin 

a-Fodrin 

o-Tubulin 

a-Tubulin 

A-Tubulin 

Amyloid A4 

Apolipoprotein J 

Breakpoint cluster region 

c-oriA-a-2 

Calelectrin 

Calisodulin 

Elongation factor-1 a 
Filaggrin 

Gs protein a subunlt 

Gll.l fibril lary icidlc proteln 

Gin synthetase 
Hexokinaso 

High-raobility group 1 protein 
«>t recepcor-related protein 

,K + -ATPase asubunit 
Neurofilament light chain 
PHojphoglycerate kinase 
Ret proto-oncogeoe 
RhoB 

Osteonectin 
5ynaptophyaia fp38) 



M10277 
^.10277 
K10277 
M24241 
H24241 
K19650 
K19650 
J03591 
X07292 
M11313 
M18 627 
K00558 
X00558 
X02344 
Y00264 
J0290S 
X02596 
J03239 
J03578 
J04046 
X03558 
K24355 
X04408 
J04 569 
5f00387 
t 

X12597 

X13916 

X04297 

X056O8 

L00160 
HI 6029 
X05320 
J03040 
X06389 



17pll-qt er 

Xql3-q26 

17cen-ql2 * 

12pI3.3-pl2.3 

9q33-34 

t 

t 

21q2l. 3-22.05 

a* 

22qll- <J 12 
3p24.3 



lq2l 

20ql3.2-ql3.3 



10pll.2 



lpl3-pll 
8p21 
Xql3 
10qll.2 

5q31-q33 
Xpll.23-pll.22 



Table 1. Composition of cDNA library determine k 
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EST category 

rabasc match— human 
^litochondriaJ genes 
lepoced sequences 
^Jbosomai RNA 
>dier nuclear genes 
abase match— other 
database march 
•adenylate insert 
insert 



Hippocampus 



nS£t^^^^^^ mchu™ hooking 

5ST-SSS Sfej 

BfawttmrtAfiJ/! I ^ VerSaJ or reve,se Primers (Applied 

P«b.EhLu . < ? Ck , ^ uenan « P^racol (70), carried out a 
awomar^nJI 1 '™ 11 CyC,er ' ^"^S «crioni w« run on a 3 73 A 
SfcLST" ^P^j 1 Biosy.tenu). Some ^enSng r« c 
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Subtracted 



48 
39 
10 
32 
32 
160 
S3 
1 



(12.8) 
(10.4) 
(2.7) 
(8.6) 
(8.6) 
(42.8) 
(14.1) 
(0.3) 



10 (8.6) 

14 (12.2) 

7 

7 

7 

44 (37.9) 

24 (20.7) 

3 (2.6) 



(6.0) 
(6.0) 
(6.0) 



Fetal 
brain 



3 (7.9) 
6 (1S.8) 



(0) 



0 

4 (10.S) 

5 (13.2) 
20 (52.6) 

0 (0) 
0 (0) 



Temporal 
cortex 



6 (7.S) 

0 (0) 

11 (13.8) 

0 (0) 

4 (5.0) 

6 (7.S) 

27 (33.7) 

26 (32.S) 
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matches were translated in all six reading ,ics, and each transla- 
tion v/as compared with the protein sequence database Protein 
Information Resource (PIR) and the ProSite protein motif database 



Table 3. EST similarices in the GenBank and PLR databases. All significant 
similarities (P < 0.01) with GenBank or PIR entries arc listed. Matches 
indicate percent identical bases for nucleotides and percent similarity (iden- 
tical plus conservative substitutions) for peptides. Number indicates the 
accession number or locus name of the. matched sequence. Abbreviations 
used arc as follows: B, bovine; BM, Brugia malayi\ BMDV, bovine mucosal 
disease virus; C, chicken; CE, Cacnorkcbditis eUgans\ D, Drosophila melanc- 
gasttr, E, B. coli; H, human; L, lamprey; M, moose; N, Neurospora craisa; P, 
pig; PP, Pscudomonas pvtida; PRV, Pscudorabies virus; R, rat; S, squid; T, 
Torpedo californicc\ TN, trans poson Tn 4556; X, Xcncpus lccvis\ Y, yeast; 
UT, untranslated; MARCKS, myristoylated alarunc-rich C kinase substrate; 
HPRT, hypo xanthine-guanine phosphoribosyltransferase; OTP, guanosinc 
triphosphate; LAMP, lysosomal -associated membrane protein; tRNA, trans- 
fer RNA; snRNP, small nuclear ribonudcoprotein; IGF, insulin growth 
faaor; Mi to, mitochondrial; DBP, albumin promoter D site-binding pro- 
tein; and Pol, polymerase. EST names in GenBank arc the three-digit 
number given here preceded by "ESTOO." 
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description 




Match 


Nunibe r 




nucleotide similarities 


(GenBank) 






247 


80-87 YD HABCXS (B) 


277 


61.5 


K2463S 


377 


Hito ATPaae 8 subunit (B) 


421 


85.1 


3CO6088 


248 


p ADP-riboayl transferase 










substrate (B) 


256 


60 


M27278 


256 


Enhancer of split <D) 


264 


71 


M20571 


257 


Ktnesin (D) 


263 


70.4 


K24441 


259 


Xotch {X} 


435 


75.4 


M33874 


270 


i-rubulin (H) 


495 


82.3 


X00734 


271 


a-Actinin (H) 


272 


85 


X15804 


273 


Apolipoprotein A-I 5 1 -UT (H) 


110 


69 


M20656 


274 


BPRT 3 *-UT (H) 


85 


75 


M26434 


275 


Xruppel- related Zn 2+ fingers (H) 


88 


67 


M20678 


276 


LAKP-1 (fl) 


257 


71.5 


J04182 


289 


Aconitaae (P) 


318 


89 


J05224 


293 


ras-lilce (*) 


71 


74 


X01669 


295 


IGF-binding protein 5'-UT (R) 


115 


77.3 


J04466 


299 


raj-like (R) 


138 


57 


X06889 


300 


RP L30 (R) 


189 


89 


K02932 


301 


RP S10 (R) 


273 


90.8 


X13549 


365 


DT conserved sequence element (H) 


85 


81 


M24686 


368 


Electromotor neuron protein (T) 


112 


64 


M30271 


371 


Maternal G10 icRNA (X) 


234 


80 


X15243 


372 


Catalase T (tf) 


65 


72.3 


X04625 


374 


RKA Pol II 6th subunit (Y> 

Peptide similarities 


216 
(PXJi) 


64.7 


H33924 


247 


80-87)cD HKRCKS (B) 


62 


82.3 


S08341 


377 


Mito ATPase A subunit (B) 


97 


92.8 


S00763 


249 


GTP-binding protein smg p25A (B) 


98 


89.8 


A35652 


375 


Genome polyprotein (BMDV) 


27 


74.1 


GNWVBV 


250 


60K filarial antigen (BM) 


109 


78.0 


A28209 


252 


Collagen 1 (C£) 


57 


57.9 


A31219 


255 


Cadherin, neuronal (C) 


42 


64.3 


A29964 


256 


Enhancer of split (D) 


87 


78.2 


A30047 - 


259 


Notch (D) 


102 


72.5 


A24768 


260 


Mobilization protein MbeA (E) 


47 


63.8 


S04790 


272 


An)cyrio (H) 


84 


60.7 


A3S049 


271 


a-Actinin (H) 


69 


95.5 


S05503 


275 


Finger protein XlcGT20-l (X) 


30 


80.0 


S06565 


279 


Elongation factor Tu (*) 


24 


79.2 


506703 


281 


Monophenol monooxygenase (M) 


29 


69.0 - 


rRMSCS 


282 


Neurogenic receptor trJtB (M) . 


56 


83.4 


A35104 


283 


01 snRNP 70X protein (MJ . 


59 


57.6 


S04336 


286 


Leu-UyiA ligase (N) 


48 


58.3 ' 


A33475 


287 


Processing-enhancing protein (N) 


. 97 


79.4 


S03968 


289 


Aconitase (P) 


106 


98.1 


A35544 


290 


Pro-rich protein (clone cP7) ' * 


56 


64.3 


£25372 


291 


NtrA (PP) 


31 


61,3 - 


JG0338 


292 


IE180 protein (PRV) 
ra j-like (*) 


22 


86.4 -* 


toBE'rV 


293 


53 


58.3 . 


B34788 


294 


Alcohol sulf otransferase (R) 


35 


71.4 . 


A33S69 


296 


Transcriptional activator DBP (R) 


39 


74 . 4 


A34894 ' 


297 . 


Myosin heavy chain (R) 


60 


.58.3 


MWRXS 


298 


Protein-tyrosine phosphatase (R) 


22 " 


86.4 * ' 


A34845 


299 


ras-like (RJ 


" 55 


58.2 


TVHCRR 


300 


RP L30 (R) ■ '. . . V- : >; ... . 


■58 


98.3 ■ 


• S11622 


301 


RP 510 (R) '^.r.'v -v.^^ c. • 


67 


97.0 • 


S01881 


364 *• 


Fibrinogen f chain (L) ' ■ ; ■ . 


35 


77.1 


FGLHGS 


257 : 


Xineain (S) - ■ V- " . ■- ?"' : f • • 




91.4. 


A35075 


368 - 


Electrcootor neuron protein. (T) fc ^ 


32 


81.3 ; 


B33319 fc . 


369 


Hypothetical protein iTN) • \ . 


. 37 *. " 


64.9 '"- 


•JQ0431 


370 


Various actics- {•) -t-.""."' 7— w ^o.- 1 . O.. 


* 37." • 


75.7 V \* 


506062:-:- 


371 .... 


Maternal G10 mRMA (X)... '.. 


* 39" 


94.9 * 


S05955 


373 '•' 


Hypothetical protein (*) " 


24 


75.0 ~- ( 


:C27061^ ' 


374 ■ 


RKA Pol TI 6th subunit (Y) -. - 


73 


30.4 v ' . 


B34588 v 



(14). Comparisons with j ProSite mocif database were done by 
means of the program MacPattem from the EMBL Data Library 
(14c). GenBank and PIR searches were conducted with our modi- 
fications of the "basic local alignment search tool" programs for 
nucleotide (BLASTN) and peptide (BLASTX) comparisons (15). 
These modifications permit many query sequences to be automati- 
cally searched in a sequential fashion. PIR searches were run on the 
National Center for Biotechnology Information BLAST network 
service. The BLAST programs contain a rapid database-searching 
algorithm that searches for local areas of similarity becwecn two 
sequences and then extends the alignments on the basis of denned 
match and mismatch criteria. The algorithm docs not consider the 
potential of gaps to improve the alignment, thus sacrificing some 
sensitivity for 60- to 80-fold increase in speed over other database- 
searching programs such as FASTA (16). 

Sequence similarities identified by the BLAST programs were 
considered statistically significant with a Poisson P-value <0.01. 
The Poisson P-value is the probability of as high a score CKcurring 
by chance, given the number of residues in the query sequence and 
the database. After the BLASTN search, 30 unmatched ESTs were 
compared against GenBank by FASTA to determine if significant 
matches were missed because of the use of BLASTN for the database 
search. No additional statistically significant matches were found. 
Statistical significance does not necessarily mean functional similar- 
ity; some of the matches reported here may indicate the presence of 
a conserved domain or motif or simply a common protein structure 
pattern. Statistically significant matches to GenBank and PIR arc 
reported in Tables 2 and 3. The length and percent identity or 
similarity of each alignment is given in Table 3 to aid in evaluation 
of match quality. 

On the basis of database searches, the 609 EST sequences were 
classified into eight groups as shown in Table 1. Four groups, with 
197 of the sequences (32% of the total), consist of matches to 
human sequences: repetitive elements, mitochondrial genes, ribo- 
somal RNA genes, and other nuclear genes. Forty-eight of the 
sequences (8%) matched nonhuman entries in GenBank or PIR, 
whereas 230 (38%) had no significant matches. The remaining 134 
(22%) sequences contained no insert between the Eco RI cloning 
sites or consisted entirely of polyadcnylatc. 



Table 4. Matches to the ProSite motif database. Partem matches from the 
ProSite database (except posttranslarional modification sites) are shown. 
Abbreviations used arc as follows: AA, amino acyl; HIGH, motif consensus 
in single-letter amino acid code (30)\ ILGF, insulin-like growth faaor; 
DHFR, dihydrofolace reductase; EGF, epidermal growth factor, Gal-P- 
UDP, galactose- 1-phosphare-uridyl; C2H2, two Cys and two His residues. 
EST names in GenBank arc the three-digit number given here preceded by 
a EST00. n 



094 

052,068,158,177,207,091,008,261 

112 

249* 

060, 128, 120, 139,279*, 218,063, 106 

235 

261 

187,203 

067 : _ ■'• • • .. * 

ioi ■• ■ , " : . ... . 

•193 -•■ ' .... 

071,072,095,055,070,106,025,200,221, 

107,102,114,131,260*, 290*, 291*, 294* # 

164,287\061,369* " ' ' • 

182,020,183,214,062 

226 ... 
083-' ..... ... 

188,275* 



' Motif 

AA-tWIA lig»3o "HIGri- 
ATP-binding aite A 
Ca rboxypep ti da a • / 2n a ♦ 
COX2 

Cytochrome c " 
DHFR 

Elongation factor 
EGF 

2Fe/2S Ferrodoxin 
Gal -P-UDP-tr&na f erase 
Glycoprotein hormone 
XLGF -blading protein. 
Leu tipper ... 



Nuclear, localization 
Rubredoxin 
Snake toxin " 
Zinc finger (C2K2) 



•Matches with sequences. from scveraJ organisms..* 



♦Sec Tabic 3 for ESTs with similariry to GenBank or PIR sequences. 
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IS^L'^^tfCt"! : ded .~ d 

genes were aligned with sequent T-Z & 'S^ 0 ^ ™* ribosomtl 

BECTnr.Th Tfacs -de^t^^^^ pn * nra 

r"'/ 1 ""^ sequence chat was not 



wich bases 101 to 200. ^P«n«c and chus is reported beginning 



Bases from 
prune r 

101-200 
201-300 
301-400 
>400 



Mismatchcs- 
ambiguirics* 

1.45 
1.72 
2.07 
3.53 



Gaps* 



Insertions 



0.18 
0.25 
0.98 
2.63 



Deletions 

0.19 
0.11 
0.37 
1.06 



Accuracy 
% 



98.2 
97.9 
96.6 
92.8 



Aligned 
bases 



•Error races are reported as number of rWacchcs in^ 71 ' - ^ 92.8 

^ ^ or ^ ^ ^ ^ ^ ^ ^ ^ ^ 



8130 
5404 
3197 



Thirty-six ESTs matched previously stra ,,„„j u 
genes with more dim 97% idend^Table^ F 7? n ' Uckar "" 
were from genes encodine er«2 • FoUf ° f thcsc ESTs 

bolic energ?, liriJS^Sr?rT ^ "^"^S ™ta- 
sine triphosphate, dJL^^ t . tfI ^ , ^^ 

we^otl^rB^ %^P^ "5= 

physin, giiai ^&273rs^^r^ 

light chain. At least six EST* Wf X ncur °&l*ment 

irfvo.ved in ^ ^^ JT^ T^ 5 P"** 
phodiesterase (CNPase) rT l"^ 00 ^ 3 '-P hos " 

ders (*). More than half of the hun^SeJt^ ^ 
mapped to chromosomes, indicating the™?5IS.^ W 
toward weU-studj-ed genes and proteins. ^ COCnCS 

ESTs without significant GcnRanlr 
to Che ProSite dataSe rt*^^™ t° 
posrtranslarionaj modification Stures £ T.°f °' 
motifs from the database (T,S 1 ! COntaincd 
scores or even, as in t^ case of L I found « 



Rg. 1. Sequence alignments of ESTs with Droso- 
ph,la neurogenic genes. ESTs and EST transla- 
oont were afcgned with nucleotide and peptide 

wtn the GCG program BESTFIT. The Decode 
alignment (iO) of EST002S9 with H2 

KT0O2S9 product with ^dLJuunEI 
43.382% .denocy; S gaps. (D) EST002S9 prod- 

?U,t *«** P^"" (M33874)' 

!2.143% sumlanty; 75.714% idenol 4 ««' 

^ps have been mtroduced to increase ,den«& 

<rs in parentheses indicate the GenBanJc acces- 
on numbers. Symbol, between UnesTd*^ 
*c« .denary; doub.e dot, indicate a sinS° 



Sue oncogene-related sequences were also among thfcDNA iP" 
sequenced. EST00299 and EST00283 showed XSw^ ^ 
wrehted genes, and EST00248 matched the 3' ZZn.\ ^ 

tcrasc. We also observed similarities with a WW ^Y 1 ™™ 
RNA polymerase subunit and Tc^^^S^Z 
ron-assoaated protein. Two EST/may 4««enn^eS^ 
known human gene families: EST00270 S^hed STdStS? 

Sth 8 T%^ ^ 91 h % Md ^° 0271 -cched a P a^ 

witn 85% idennty at the nucleodde Jevci 

c l *«wniy or x:ois to the Drosophth eenes No£r;A and 

and EST00259 w,th the Dro^iia genes are shown in Fig 1 Both 
genes are part of a signal cascade encoded by the "neuroeeSc"'«n« 
dm are solved in the differentiation of neuronal and 2S£S 5 

(M). It has been proposed that rhc Enhancer of split protein interacts 

Snvert"^ T ^ » ^ P rod ^of the Notch g^To 
convert a developmental s.gnal into an altered partem of eenl 
express^ („,. EST002S6 matched near the 5' eJSS EnW 
S ^ SCqUenCC ' aWa ^ fr0m ^ mammalian G protein B 

mate? IT*-^ e ' CmenC Pi " of KnK»59 
match to Notch is in the cdclO/SWU region that is similar to three 
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338 AMCO>tfnSt 



1 TIiTnTJfT'f??*^^ 



flUarirv score nfn'iVn'rc 000 re P feSCT « » "^^^^-is^^ieoo^ ,«! L^^^ %T ^^'''f^ff <P11 ^ 

-nuanty score of 0.1 to 0.4. Scores are from pairwise sJignmerts ba^rf . >*^™*^.™»»**Li£^^ 

S4 -ion the matrix of Schwara and D*yboT (31) as modified in the GCG package. 
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Fig. 2. Chromosome 
segregation of ESTs 
mapped by PCR. Chro- 
mosomes and ESTs are 
as follows: 1 (293*, 012, 
077, 053, 079, 202, 
086), 2 (021, 037,' 234),. 
3 (248*, 257% 274*, 
062), 4 (009, 038), 5 
(026, 030, 104, 123), 6 
(301% 007, 219, 023, 
356), 8 (245*, 223), 10 
(024, 197, 131), 11 
(016, 111), 12 (014), 13 (372*, 273*, 200) 14 (221, 201, 008), 15 (165), 
16 (373*), 17 (068), 19 (368*, 080), and X (276*). PCR conditions were 
as follows: 60 ng of genomic DNA was used as a template for PCR with 80 
ng of each oligonucleotide primer, 0.6 unit of Taq polymerase, and 1 p.Ci of 
c- 32 P-labcled deoxycytidine triphosphate. The PCR was performed in a 
microplate thermocyclcr (Technc) under the following conditions: 30 cycles 
of 94°C, 1.4 min; 55°C, 2 mini *nd 72*C, 2 min; with a final extension at 
72°C for 10 min. The amplified products were analyzed on a 6% polyacryl- 
amidc sequencing gel and visualized by autoradiography. Asterisk indicate 
those ESTs with sirniiariry to GcnBank or PLR sequences (Tables 2 and 3). 
EST names in GcnBank are the three-digit number given here preceded by 
"ESTOO." 



ceii-cycle control genes in yeast and is tighdy conserved in the 
Xenopus laevis Notch homolog, Xotch. In Drosophila, Enhancer of split 
is required for formation of epidermal tissue. Notch contains several 
epidermal growth factor-like repeats and appears to be involved in * 
cell-cell communication during development (20). 

Seven genes were represented by more than one EST. Compari- 
sons of all the ESTs against one another revealed two overlaps of 
unknown ESTs: EST00233 and EST00234 matched in opposite 
orientations, and EST00235 and EST00236 matched in the same 
orientation beginning at the same nucleotide. Five human genes 
were represented by more than one EST: (3-actin (three), 7-actin 
(two), a-tubulin (two), a-2-macroglobuiin (two), and CNPasc 
(two). 



Mapping of ESTs to Human Chromosomes 

We used the polymerase chain reaction (PCR) to screen a series of 
somatic cell hybrid cell lines containing defined sets of human 
chromosomes for the presence of a given EST (21). In this process, 
only the hybrids that contain the human gene corresponding to the 
EST will yield an amplified fragment. An EST is assigned to a ■ 
chromosome by analysis of the segregation pattern of PCR products 
from hybrid DNA templates. The single human chromosome 
present in all hybrids that give rise to an amplified fragment is the 
location of die EST. 

PCR mapping has been applied to 46 clones, as summarized in 
Fig. 2. The EST of the human gene for apolipoprotein J (also called 
SP-40,40 complement-associated protein and sulfated glycoprotein 
2) was localized to chromosome 8. Eleven other ESTs with Gcn- 
Bank or PIR similarities were mapped to chromosomes. Although 
PCR mapping of somatic cell hybrids is relatively rapid — up to three 
clones can be assigned per day with a single thermal cycler — it is 
relatively expensive, costing about ten rimes as much as EST 
sequencing. With the same oligonucleotide primers, sublocalization 
can be achieved with panels of fragments from specific chromosomes 
or pools of large genomic clones in an analogous manner. Other 
mapping strategies that have been proposed are multiplex in situ 
hybridization, prcscrccning with labeled flow-sorted chromosomes, 
and preselection by hybridization to construct chromosome specific 
cDNA libraries. However, these methods are limited by the purity 
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for detection. 
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Automated DNA Sequencing Accuracy and 
GenBank Submission 

ESTs that match human sequences in GenBank are excellent tools 
for the analysis of the accuracy of double -strand automated DNA 
sequencing. Ninety EST- GenBank matches were examined for the 
number of nucleotide mismatches and gaps required to achieve 
optimal alignment by the Genetics Computer Group (GCG) pro- 
gram BESTFIT (22). The number of mismatches, insertions, and 
deletions was counted for each hundred bases of the sequence (Table 
5). As expected, the sequence quality was best closest to the primer 
and decreased rapidly after about 400 bases. The number of 
deletions and insertions relative to the GenBank reference sequence 
increased five- to tenfold beyond 400 bases, whereas the number of 
mismatches doubled. The average accuracy rate for individual 
double-stranded sequencing runs was 97.7% for up to 400 bases. 

The rrunimum criteria for submission of ESTs to GenBank were 
that sequences be at least 150 bases in length and contain <3% 
ambiguous base calls. The overall accuracy of sequences submitted 
from each template group was at least 97%, based on matches to 
known human genes. Three hundred forty-eight ESTs met these 
criteria and were submitted to GenBank with accession numbers 
M61953 through M62300, inclusive. All ESTs except those match- 
ing mitochondrial or ribosomal RNA (rRNA) genes and simple 
repetitive elements were submitted to GenBank. 



Conclusions and Prospects 

Single-run DNA sequencing has proven to be an efficient method 
of nhraining preliminary data on cDNA clones. Our results demon- 
strate that sufficient information is contained in 150 to 400 bases of 
a nucleotide sequence from one sequencing run for preliminary 
identification of the cDNA and localization to a chromosome. In 
addition to the 35 ESTs homologous to known human genes, 48 
ESTs matched sequences in GenBank or PIR with moderate to 
striking similarity, including high-quality matches with genes from 
such evolutionary distant organisms as yeast (EST00374) and 
Ncurospon (EST00287) (Table 3). 

Two hundred thirty ESTs did not match any current database 
entries and therefore represent new, previously uncharacterized 
genes. A multitude of approaches for classifying these genes exists, 
including complete sequencing and expression, chromosome map- 
ping, tissue distribution, and immunological characterization. Cur- 
rently unidentified cDNAs will also be classified by similarity to 
genes from other organisms as those sequences become available. 
Three ESTs reported here (EST00257, EST00259, and EST0374) 
were identified by similarity to sequences that have appeared since 
the last full release of GenBank. 

The random selection approach used here revealed an unaccept- 
ably large number of highly represented clones in these cDNA 
libraries. Over 30% of the clones from the hippocampus cDNA 
library consisted of rRNA, mitochondrial cDNAs, or inserts con- 
sisting entirely of polyA. Sixty-eight ESTs matched 12 different 
mitochondrial genes, including 18 matches to cytochrome oxidase I. 
Although elimination of these uninformativc clones is a priority for 
developing ideal cDNA libraries, techniques to reduce repeated 
sequencing of clones will become increasingly important as large 
numbers of cDNAs are sequenced. The use of library preprocessing 
techniques such as subtraction, which preferentially reduces the 
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population of certain sequences in the library (11, 12), and normal- 
ization, which resales in ail sequences being represented in approx- 
imately equal proportions in the library (23), should reduce repeated 
sequencing of high and intermediate abundance clones and maxi- 
mize the chances of rinding rare messages from specific cell popula- 
tions. In our initial experiments with subtractive hybridization of 
the hippocampus library with a human fibroblast cDNA library, 
CNPase and GFAP clones were enriched greater than tenfold and 
twofold, respectively. Another characteristic of the ideal cDNA 
library would be directional cloning so that either a coding sequence 
or a 3' noncoding sequence could be selectively obtained. 

The EST data, in conjunction with physical mapping, will provide 
a high resolution map of the location of genes along chromosomes, 
a map that would be more costly to construct by genomic sequenc- 
ing and analysis. By performing a single DNA sequencing reaction 
on each cDNA clone, a key piece of information was obtained for 
the relatively low cost of about S0.12 to $0.15 per base. The EST 
approach will provide a new resource for the analysis of chromo- 
some sequence and for human gene discovery. 

The screening of cDNA clones to identify the protein comple- 
ment of a tissue has been explored by others to a limited extent. In 
1983, Putney and co-workers sequenced over 150 clones from a 
rabbit muscle cDNA library and identified clones for 13 of the 19 
known muscle proteins, including one new isotype, but no un- 
known coding sequences (24). Over 400 adult head-specific cDNA 
clones from Drosophila have been identified by differential screening 
of cDNA libraries from different developmental stages (25). Im- 
provements in DNA sequencing technologies have now made 
feasible essentially complete screening of the expressed gene com- 
plement of an organism. 

In our own laboratory, the EST approach should result in the 
partial sequencing of most human brain cDNAs in a few years. 
Similar approaches begun elsewhere (26) could result in a database 
of most human expressed genes in less than 5 years. The presence of 
these minimally characterized sequences in GenBank will assist 
research efforts in several areas of biology. The EST database will 
provide identification and confirmation of coding regions in naive 
genomic sequences. Sublocalization of cDNAs that have been 
mapped to chromosomes will help define the genetic content of 
specific chromosomal regions and permit correlation with patterns 
of inheritance in generic disease. In a related experiment, chromo- 
some sublocalization was the key to establishing that the 7-ami- 
nobutyric acid-benzodiazepine receptor p 3 suburiit is deleted in 
individuals with Angelman-Prader- Willi syndrome (27). We antic- 
ipate that ESTs from human brain will further the identification of 
genes associated with other neurological diseases and will provide a 
more complete view of gene expression in the brain. 
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ABSTRACT Polymerase chain reaction, using thermo- 
stable DNA polymerase, has revolutionized DNA diagnostics. 
Another thermostable enzyme, DNA ligase, is harnessed in the 
assay reported here that both amplifies DNA and discriminates 
a single-base substitution. This cloned enzyme specifically links 
two adjacent oligonucleotides when hybridized at 65°C to a 
complementary target only when the nucleotides are perfectly 
base-paired at the junction. Oligonucleotide products are ex- 
ponentially amplified by thermal cycling of the ligation reaction 
in the presence of a second set of adjacent oligonucleotides, 
complementary to the first set and the target. A single-base 
mismatch prevents ligation /amplification and is thus distin- 
guished. This method was exploited to detect 200 target mol- 
ecules as well as to discriminate between normal f} A - and sickle 
/^-globin genotypes from 10-fd blood samples. 



DNA diagnostics uses the tools of molecular biology to 
identify nucleotide substitutions, deletions, or insertions in 
genes of medical interest (1). A reliable DNA diagnostics 
method will require faithful amplification of target sequences, 
accurate single-base discrimination, low background, and, 
ultimately, complete automation. The initial target nucleic 
acid amplification may be accomplished by using the poly- 
merase chain reaction (PCR) (2), self-sustained sequence 
replication (3), or ligase amplification reaction (4, 5). Subse- 
quently, single-base mismatches may be detected via allele- 
specific and reverse oligonucleotide hybridization (6, 7), 
denaturing gradient gel electrophoresis (8), RNase or chem- 
ical cleavage of mismatched heteroduplexes (9, 10), use of 
nucleotide analogs (11), or fluorescence PCR amplification/ 
detection (12). 

Landegren et al. (13) have pioneered an oligonucleotide 
ligation assay to circumvent the need for electrophoresis or 
precise hybridization conditions. Two oligonucleotide 
probes are hybridized to denatured DNA, such that the 3' end 
of the first one is immediately adjacent to the 5' end of the 
second probe. DNA ligase can covalently link these two 
oligonucleotides, provided that the nucleotides at the junc- 
tion are perfectly base-paired to the target (4, 5, 13, 14). A 
single-nucleotide substitution can, therefore, be distin- 
guished. Use of biotin on the first probe and a suitable 
nonisotopic reporter group on the second probe allows for 
product capture and detection (13) in a manner amenable to 
automation. 

Ideally, the oligonucleotides should be sufficiently long 
(20-25 nucleotides) so that each will preferentially hybridize 
to its unique position on the human genome. The specificity 
of ligation should be particularly enhanced by performing the 
reaction at or near the melting temperature (t m ) of the two 
oligonucleotides. At higher temperatures a single-base mis- 
match at the junction forms not only an imperfect double 
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helix but also destabilizes hybridization of the mismatched 
oligonucleotide. 

This report describes DNA detection that uses a thermo- 
stable ligase to exquisitely discriminate between a mis- 
matched and complementary DNA helix (Fig. 1 Upper). 
Because the enzyme retains activity after multiple thermal 
cycles, the ligations may be repeated to linearly increase 
product [termed ligase detection reaction (LDR)]. Product 
may be further amplified in a ligase chain reaction (LCR) by 
using both strands of genomic DNA as targets for oligonu- 
cleotide hybridization. Two sets of adjacent oligonucleo- 
tides, complementary to each target strand, are used. The 
ligation products from one round can become the targets for 
the next round of ligation (Fig. 1 Upper). By use of LCR, the 
amount of product can be increased in an exponential fashion 
by repeated thermal cycling. 

MATERIALS AND METHODS 

Thermostable Ligase. Plasmid libraries of Thermits aquat- 
icus strain HB8 DNA (ATCC27634) were screened for the 
ability to complement a temperature-sensitive ligtsl deriva- 
tive of Escherichia coli [unpublished work; ref. 16]. One 
complementing plasmid (pDZl) contained a thermostable 
ligase gene as evidenced by (0 presence of a thermostable 
NAD + -dependent nick-closing (ligase) activity in crude ex- 
tracts when assayed at 65°C (17) and (//) DNA sequence 
analysis of the first 60 codons of the putative gene revealed 
>50% amino acid identity to E. coli ligase (18). Thermostable 
ligase was purified from E. coli cells containing the ligase 
gene cloned downstream of an inducible T7 expression 
system (19), as described elsewhere (unpublished work). 
Ligase activity was assayed for the ability to seal nicked 
plasmid DNA (pUC4KIXX) as monitored by electrophoresis 
on 1% agarose gel. One nick-closing unit of ligase is defined 
as the amount of ligase that circularizes 0.5 fig of nicked 
pUC4KIXX DNA in 20 p\ of 20 mM Tris-HCI, pH 7.6/50 mM 
KC1/10 mM MgClj/l mM EDTA/10 mM NAD + /10 mM 
dithiothreitol overlaid with a drop of mineral oil after 15-min 
incubation at 65°C. 

Genomic DNA, Plasmid DNA, and Oligonucleotides. Hu- 
man genomic DNA was isolated from 0.5 ml. of whole blood 
as described (20). Proteinase K and RNase A were removed 
by sequential extractions with phenol, phenol/chloroform, 
chloroform, l-butanol (twice), and nucleic acid was recov- 
ered by precipitation with ethanol. Samples were boiled for 
5 min before use in LCR assays. Plasmid DNAs containing 
the fi A - and /^-globin gene alleles were a gift from D. 
Nickerson (California Institute of Technology, Pasadena, 
CA) and were digested with Taq I before use as target DNA. 
Oligonucleotides were assembled by the phosphoramidite 
method (21) on an Applied Biosystems model 380A DNA 
synthesizer, purified by reversed-phase HPLC, and provided 



Abbreviations: PCR, polymerase chain reaction; LDR, ligase detec- 
tion reaction; LCR, ligase chain reaction. 
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#103 CTTTTT fi 27m»r 6*°C 

• 102 CTTT I 25m*r 66°C 

#101 CT C A 23o«r «6°C 

• 107 (5') C C (3*) 22m«r 70°C 

0* C lob In CACACC ATC CTC CAC CTC ACT CCT CAC CAC AAC TCT CCC CTT ACT CCC CTC 

CTCTCC TAG CAC CTC CAC TCA CCA CIC CTC TTC ACA CCC CAA TCA CCC CAC 

#109 (3') T C (5') 22c«r 70°C 

•10* I C TC 24oer 68°C 

•105 A TTTC Z6a«c 65°C 

•10« £ TTTTTC 28otr 66°C 



fi^ C lob In Met Vsl His Lou Thr ?ro Cltj Clu Ly» Ser Als V«l Thr All Leu 

pS Clobln Vsl 
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by R. Kaiser and S. Horvath (California Institute of Tech- 
nology, Pasadena, CA). Oligonucleotide sequences (5-3') 
are: 101, GTC ATGGTGC ACCTG ACTCCTG A ; 102, GTT- 
TCATGGTGCACCTGACTCCTGT; 103, GTTTTTCATG- 
GTGC ACCTG ACTCCTGG; 104, CTGCAGTAACGGCA- 
GACTTCTCCT; 105, CTTTGCAGTAACGGCAGACTTC- 
TCCA; 106, CTTTTTGC AGTAACGGC AG ACTTCTCCC ; 
107, GGAGAAGTCTGCCGTTACTGCC; 109, CAGGAGT- 
CAGGTGCACCATGGT. (See Fig. 1.) 

32 P Labeling of Oligonucleotides. Oligonucleotides 107 or 
109 (0.1 ptg « 15 prnol) were 5' end-labeled in 20 /tl of 30 mM 
Tris-HCl, pH 8.0/20 mM Tricine/10 mM MgCl 2 /0.5 mM 
EDTA/5 mM dithiothreitol/400 /iCi of [r 32 P]ATP (6,000 
Ci/mM = 60 prnol ATP, New England Nuclear; 1 Ci = 37 
GBq) by addition of 15 units of T4 polynucleotide kinase 
(New England Biolabs). After incubation at 37°C for 45 min, 
unlabeled ATP was added to 1 mM, and incubation was 
continued an additional 2 min at 37°C. The reaction was ] 
terminated by adding 0.5 /-d of 0.5 M EDTA, and the kinase 
was heat-inactivated (65°C for 10 min). Unincorporated 32 P 
label was removed by chromatography with Sephadex G-25 ' 
preequilibrated with Tris/EDTA buffer. Specific activity 
ranged from 7 to 10 x 10 8 cpm//ig of oligonucleotide. j 

LDR and LCR Reaction Conditions. For LDR reactions, 
labeled oligonucleotide (200,000 cpm = 0.28 ng = 40 fmol) 1 
and unlabeled diagnostic oligonucleotide (0.27 ng = 40 fmol) 1 
were incubated in the presence of target DNA (1 fmol = 6 x 
10 8 molecules of Tag I-digested /3 A - or ^-globin plasmid) in 1 
10 /xl of 20 mM Tris-HCl, pH 7.6/100 mM KC1/10 mM ' 
MgCIa/l mM EDTA/10 mM NAD*/10 mM dithiothreitol/4 1 
fig of salmon sperm DNA/15 nick-closing units of 7. aquat- < 
icus ligase and overlaid with a drop of mineral oil. Reactions i 
were incubated at 94°C for 1 min followed by 65°C for 4 min, i 
and this cycle was repeated 5 or 20 times. For LCR reactions, i 
unlabeled diagnostic oligonucleotide pairs (101 and 104, 102 
and 105, or 103 and 106; 40 fmol each) and adjacent pairs of i 
labeled oligonucleotides (107 and 109, 40 fmol each) were ] 



Fic. 1 . {Upper) Diagram depicting DNA ampli- 
fication/detection by using. LCR. DNA is heat 
denatured, and four complementary oligonucleo- 
tides are hybridized to the target at a temperature 
near their melting temperature (65°C; fj. Thermo- 
stable ligase will covalently attach only adjacent 
oligonucleotides that are perfectly complementary 
to the target {Left). Products from one round of 
ligations become targets for the next round, and 
thus products increase exponentially. Oligonucleo- 
tides containing a single-base mismatch at the junc- 
tion do not ligate efficiently and, therefore, do not 
amplify product {Right)- (Lower) Nucleotide se- 
quence and corresponding translated sequence of 
the oligonucleotides used in detecting and pP- 
globin genes. Oligonucleotides 101 and 104 detect 
the f$ A target, whereas oligonucleotides 102 and 105 
detect the 0 s target when ligated to labeled oligo- 
nucleotides 107 and 109, respectively. Oligonucle- 
otides 103 and 106 were designed to assay the 
efficiency of ligation of G-T or G-A and C-A or 
C-T mismatches when using £ A - or 0 s -globin gene 
targets, respectively. Oligonucleotides have calcu- 
lated r m values of 66-70°C (15), just at or slightly 
above ligation temperature. The diagnostic oligo- 
nucleotides (101-106) contained slightly different 
length tails to facilitate discrimination of various 
products when separated on polyacrylamide dena- 
turing gel. 

incubated in the presence of ligase and target DNA (ranging 
from 100 amol to less than one molecule per tube) with 20 or 
30 cycles as described above. 

Electrophoresis. Samples (4 pd) were in 45% formamide and 
denatured by boiling for 3 min before loading (40,000 or 
80,000 cpm/lane). Electrophoresis was in 10% polyacrylam- 
ide gel containing 7 M urea in a buffer of 100 mM Tris borate, 
pH 8.9/1 mM EDTA for 2 hr at 60- W constant power. After 
removing urea, gels were dried and autoradiographed over- 
night at -70°C on Kodak XAR-5 film with the aid of a Cronex 
intensifying screen (DuPont). 

RESULTS 

The gene encoding human 0-globin was selected as a model 
system to test ligation amplification and detection. The 
normal 0 A and sickle /3 s genes differ by a single A T 
transversion that leads to a change of a glutamic acid residue 
to a valine in the hemoglobin chain [Fig. 1, Lower (22)]. 
Diagnostic oligonucleotides containing the 3' nucleotide 
unique to each allele were synthesized with different-length 
5' tails (Fig. 1 Lower). Upon ligation to the invariant 32 P- 
labeled adjacent oligonucleotide, the individual products 
could be distinguished when separated on a polyacrylamide 
denaturing gel and detected by autoradiography. 

Specificity of Thermostable Ligase. The specificity of ligat- 
ing oligonucleotide pairs on a target DNA with perfect 
complementarity was directly compared with each possible 
mismatch (see Fig. 2 and Table 1). Results show that T. 
aquaticus ligase efficiently links correctly base-paired oligo- 
nucleotides and gives near zero ligation in the presence of a 
mismatch (Table 1). When only 1 fmol of target DNA was 
used under LDR conditions, the worst mismatches were 
1.5-1% (G-T, T-T), whereas other mismatches were <0.4% 
(A-A, C-T, G-A, G-A) of the products formed with com- 
plementary oligonucleotide base pairs (AT). This is substan- 
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Fig. 2. Autoradiogram showing specificity of 7*. aquaticus ligase under LDR and LCR amplification conditions. Specificity was assayed by 
ligation of diagnostic oligonucleotides in the presence of either complementary or mismatched p A - or /^-globin gene target DNA (LDR 
amplification). Ligation of diagnostic oligonucleotides 101 (0* allele), 102 (0 s allele), or 103 to labeled 107 gives lengths of 45, 47, or 49 
nucleotides, respectively. For the complementary strand, ligation of diagnostic oligonucleotides 104 (0* allele), 105 (/3 s allele), or 106 to labeled 
109 gives lengths of 46, 48, or 50 nucleotides, respectively. The diagnostic oligonucleotide listed in each lane and the appropriate adjacent labeled 
oligonucleotide (40 fmol each) was incubated with target DNA (1 fmol = 6 x 10 8 molecules of Taq I-digested 0 A - or /^-globin plasmid), as 
described. In LCR amplification, samples contained pairs of diagnostic oligonucleotides (0* allele-specific 101 and 104, 0 s allele-specific 102 
and 105, or "C-G pair" 103 and 106), both labeled oligonucleotides (107 and 109), and were incubated with ligase and 10 amol of target DNA 
(6 x 10 6 molecules; 100-fold less than for LDR) as described. Samples were loaded in groups of eight and run into the gel; then the next set 
was loaded. This accounts for the "slower" migration of bands on the right side of the autoradiogram. (Intensifying screen was not used for this 
autoradiogram.) Bands were excised from the gel and assayed for radioactivity (Table 1). 



tially better than found for mesophilic T4 or E. coli ligase 
when using similar radioactive detection methods (13, 14). 

In the amplification/detection (LCR) experiments, four 
oligonucleotides were incubated with ligase and 10 amol of 
target DNA (see Fig. 2 Right and Table 1 lower part). The 3' 
nucleotide of each unlabeled diagnostic oligonucleotide was 
either complementary or mismatched to the target DNA and 
yet was always complementary to its pair — i.e., AT for 101 
and 104, T*A for 102 and 105, and G-C for 103 and 106. 



Table 1. Quantitation of complementary and mismatched LDR 
and LCR 





Oligonucleotide 


Product 


Mismatched/ 




base-target 


formed, 


complementary, 


Amplification 


base 


%* 


%t 


LDR (6 x 10 8 target 


A-T 


21.5 




molecules = 1 


T-A 


13.2 




fmol) 


T-A 


17.9 






A-T 


12.4 






A-A 


<0.1 


<0.4 




T-T 


0.12 


0.7 




T-T 


0.16 


1.0 




A-A 


<0.1 


<0.4 




G-T 


0.30 


1.4 




C-T 


<0.1 


<0.4 




G-A 


<0.1 


<0.4 




C-A 


<0.1 


<0.4 


LCR (6 x 10 6 target 


A-T, T-A 


41.4 




molecules = 10 


T-A, A-T 


10.4 




amol) 


A-A, T-T 


0.45 


1.1 




T-T, A-A 


<0.05 


<0.2 




G-T, C-A 


0.51 


1.3 




G-A, C-T 


<0.05 


<0.2 



Bands from 20-cycle LDR and 30-cycle LCR experiments de- 
scribed in Fig. 2 were excised from the gels and assayed for 
radioactivity. 

•Percentage product formed = cpm in product band/cpm in starting 
oligonucleotide band. 

Percentage mismatched/complementary = cpm in band of mis- 
matched oligonucleotide/cpm in band of complementary oligonu- 
cleotide when using the same target DNA and indicates noise-to- 
signal ratio. 



Four- way (target independent) ligation was minimized by use 
of (/) carrier salmon sperm DNA and (i7) oligonucleotides 
designed to create single-base 3' overhangs (this work, see 
Fig. 1) or single-base 5' overhangs (not tested). Note that an 
initial "incorrect" ligation of a mismatched oligonucleotide to 
target DNA would subsequently be amplified with. the same 
efficiency as a correct ligation (See Fig. 1). Nevertheless, the 
worst mismatches were 1.3% to 0.6% (G-T, C-A; A-A, T-T), 
whereas others were <0.2% (T-T, A-A; G-A, C-T) of the 
products formed with complementary basepairs (AT, T*A). 
LCR, using thermostable ligase, is thus the only method that 
can both amplify and detect single-base mismatches with high 
signal-to-noise ratios (4,5). 

The entire set of experiments described above was re- 
peated with a buffer containing 150 mM instead of 100 mM 
KC1. Results were essentially the same as in Fig. 2 and Table 
1; mismatches for LDR ranged from 0.6% to <0.3% and for 
LCR ranged from 1.7% to <0.3% of the complementary 
products (data not shown). Thus for 7\ aquaticus ligase, 
discrimination between matched and mismatched oligonu- 
cleotides is not critically dependent on salt conditions, in 
contrast to the requirements for mesophilic ligases (4, 5, 13, 
14). 

Specificity of LCR DNA Amplification with Sub-amol Quan- 
tities of Target DNA. The extent of LCR DNA amplification 
was determined in the presence of target DNA ranging from 
100 amol = 6 x 10 7 molecules to <1 molecule per tube (Fig. 
3, Table 2). In the absence of target DNA, no background 
signal was detected when carrier salmon sperm DNA (4 ^g) 
was present (compare last 8 lanes of Fig. 3). At higher target 
concentration, DNA amplification was essentially complete 
after 20 cycles, whereas at lower initial target concentration 
substantially more product is formed with additional ampli- 
fication cycles. After 30 cycles of LCR, 200 molecules of 
initial target DNA were amplified 1.7 x 10 5 fold and thus 
could be readily detected. The average efficiency of ligation 
per cycle (40-50%, calculated as described in ref. 4) could be 
potentially enhanced by altering buffer conditions [such as 
using NH4CI, MnCI 2 , polyamines, or polyethylene glycols 
(17)], enzyme concentration, or thermal-cycling times and 
temperatures. 
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Fig. 3. Autoradiogram showing LCR amplification at different target concentrations. Labeled invariant oligonucleotides (107 and 109; 
200,000 cpm = 40 fmol each) and unlabeled £ A allele oligonucleotides (101 and 104; 40 fmol each) were incubated with target DNA (ranging 
from 100 amol = 6 x 10 7 molecules to <1 molecule per tube of Taq I-digested 0 A -globin plasmid) and Iigase as described. Samples were 
electrophoresed, gel was autoradiographed overnight, and bands were counted as described (see Table 2). Bands of 45 and 46 nucleotides 
correspond to ligation products of the coding and complementary 0 A -globin oligonucleotides. Lower-molecular-mass products correspond to 
ligation of minor species in the synthesized oligonucleotide preparations that were shorter than intended product. Samples were loaded in groups 
of eight, giving the appearance of slower migration on the right of the autoradiogram. 



To test ligase discrimination between complementary and 
mismatched oligonucleotides in a direct competition assay, 
the above LCR experiment was repeated with or without 
oligonucleotides that would give G-T and C-A mismatches 
(see Table 3). At higher target concentrations, the mis- 
matched product ranged from 1.8% to 0.5% of the comple- 
mentary product. Mismatched product could not be detected 
when using <3 amol of target DNA. As control, excess 
mismatched target DNA (/3 s - instead of 0 A -globin DNA at 6 
x 10 7 molecules per tube) gave only 2.1% and 1.5% product. 
Thus, the signal from the correctly paired ligation products is 
50- to 500-fold higher than from mismatched products, under 
either competition or individual LCR ligation conditions. 

Detection of 0-Globin AHeles in Human Genomic DNA. 
DNA isolated from the blood of normal (/3 A /3 A ), carrier 
(p A fP), and sickle cell individuals was tested for 

allele-specific LCR detection. With target DNA correspond- 
ing to 10 of blood, /3 A and /3 s alleles could be readily 

Table 2. Quantitation of LCR amplification 



Target 


Product 




molecules 


formed, %* 


Amplification' 


6 x 10 7 


134* 




2 x 10 7 


96 




6 x 10 6 


107* 




2 x 10 6 


78 




6 x 10 5 


85 




2 x 10 3 


48 


5.8 x 10 4 


6 x 10 4 


25 


1.0 x 10* 


2 x 10 4 


4.5 


5.4 x 10 4 


6 x 10 3 


2.3 


9.2 x 10 4 


2 x 10 3 


0.36 


4.3 x 10 4 


6 x 10 2 


0.18 


7.2 x 10 4 


2 x 10 2 


0.14 


1.7 x lo 5 


60— 0§ 


<0.05 





Bands from 30-cycIe LCR experiment described in Fig. 3 were 
excised from gels and assayed for radioactivity. 
•Percentage product formed - cpm in product band/cpm in starting 
oligonucleotide band. 

Amplification = no. of product molecules formed/no. of target 
molecules. 

*At higher target concentration, DNA amplification was essentially 
complete after 20 cycles; slightly imprecise excision of 30-cycle 
bands from this portion of the gel probably accounts for product 
formed values >100%. 

$ Product formed from 0 to 60 target molecules was indistinguishable 
from background (see Fig. 3). 



detected by using allele-specific LCR (Fig. 4). As seen with 
plasmid-derived target DNA (see Fig. 2), efficiency of liga- 
tion (and hence detection) is somewhat less for j^- than 
^-specific oligonucleotides. This difference may be a func- 
tion of the exact nucleotide sequence at the ligation junction 
or the particular oligonucleotides (with differing 5' tails) used 
in these LCR experiments. Nevertheless, the results show 
the feasibility of direct LCR allelic detection from blood 
samples without any need for primary PCR or self-sustained 
sequence replication amplification. 

DISCUSSION 

The specificity, yield, and sensitivity of PCR were signifi- 
cantly improved by incorporating use of a thermostable DNA 
polymerase (2), resulting in a simplified procedure that has 

Table 3. Quantitation of LCR amplification with or without 
mismatched competitor oligonucleotide 



Complementary 
oligonucleotides 



Complementary and mismatched 
oligonucleotides 



Target 


Product 


Product 


Mismatched/ 


molecules 


formed, %* 


formed, %* 


complementary, 


6 x 10 7 (0 A ) 


114* 


93 


1.0 


2 x 10 7 (0 A ) 


93 


95 


1.8 


6 x 10 6 ($ A ) 


102* 


93 


0.5 


2 x 10 6 (0 A ) 


90 


67 


0.5 


6 x 10 5 0B A ) 


51 


46 




2 x 10 3 (p A ) 


31 


23 




6 x 10 4 (fi A ) 


17 


9.3 




2 x 10 4 (0 A ) 


8.6 


2.9 




6 x 10 3 (0 A ) 


3.2 


0.8 




0 


<0.1 


<0.1 




6 x 10 7 (/3 s ) 


2.1 


1.5 





One set of experiments contained 40 fmol each of P A allele 
oligonucleotides 101 and 104 per tube, exactly as described for Fig. 
3, whereas the second set had, in addition, 40 fmol each of oligo- 
nucleotides 103 and 106 per tube (forming G-T and C-A mismatches, 
respectively). Bands from 30-cycle LCR experiment, as described 
for Fig. 3, were excised from the gels and assayed for radioactivity. 
♦Percentage product formed = cpm in complementary product 
band/cpm in starting oligonucleotide band. Imprecise excision of 
two bands from the gel probably accounts for product formed values 
>100% (see Table 2). 

Percentage mismatched/complementary =* cpm in bands of mis- 
matched oligonucleou'de products/cpm in band of complementary 
oligonucleotide products in same lane and indicates noise-to-signal 
ratio. 
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Fig. 4. Detection of /3-globin alleles in human genomic DNA by 
autoradiogram. DNA was isolated from blood samples of normal 
(0 A /3 A ), carrier (/3 A /3 S ), and sickle cell individuals as de- 

scribed. Genomic DNA (corresponding to 10 p\ of blood or 6 x 10 4 
nucleated cells) was tested in two separate tubes containing labeled 
oligonucleotides (107 and 109; 200,000 cpm = 40 fmol each) and 
either unlabeled )3 A test oligonucleotides (101 and 104) or unlabeled 
0 s test oligonucleotides (102 and 105; 40 fmol each). Both reaction 
mixtures were incubated under the same buffer (without salmon 
sperm DNA), enzyme, and cycle conditions described. Samples 
were electrophoresed, and the gel was autoradiographed overnight as 
described. Ligation products of 45 and 46 or 47 and 48 nucleotides 
indicate presence of the /3 A - or ^-globin gene, respectively. Oligo, 
oligonucleotide. 

become widely applicable (23, 24). Similarly, this report 
demonstrates the utility of thermostable ligase for allelic- 
specific gene detection under both LDR and LCR conditions. 
Both LCR and PCR amplification derive their specificity from 
the initial hybridization of primer to target DNA, and this is 
enhanced by (/) use of oligonucleotides of sufficient length to 
be unique in the human genome and («) use of temperatures 
near the oligonucleotide t m . LCR amplification faithfully de- 
tected as few as 200 initial target molecules, as well as both /3 A 
and jS s alleles directly from genomic DNA. LCR did not 
amplify a T-T, G-T, C-T, or C-A 3 '-terminal mismatch, as has 
been reported for allele-specific PCR amplifications (25). 
Whether LCR will tolerate internal mismatches present in viral 
variants remains to be determined (25). 

LCR amplification/detection is compatible with a primary 
amplification of genomic DNA by either PCR (2) or self- 
sustained sequence replication (3). Such a primary amplifica- 
tion could allow for LCR detection of emerging viral subpop- 
ulations where the mutations are known, such as the multiple 
mutations in human immunodeficiency virus conferring resis- 
tance to 3'-azido-3'-deoxythymidine (AZT) (26). One can also 
envisage multiplexing the primary amplification of dozens of 
loci simultaneously (27) and aliquoting products into separate 
microtiter wells. A subsequent round of LCR amplification/ 
detection could then distinguish a particular target loci, even 
if it were initially amplified only in the amol range. Such a 
multiplex PCR/LCR detection assay, with the potential for an 
automated format, could (/) rapidly screen large populations 
for monogenic disease polymorphisms, (//) distinguish several 
polymorphisms simultaneously from a single sperm to map the 
relative positions of these polymorphisms (28), and (Hi) help 
eliminate current ambiguities in DNA identification of indi- 
viduals for forensic or paternity cases (29). 

The potential uses of thermostable enzymes that survive 
the temperature-cycling conditions required to denature dou- 
ble-stranded DNA are just now being tapped. With variations 
of the LCR concepts outlined above, thermostable ligase 
could be used to (/) covalently capture specific DNA frag- 
ments to a solid matrix, with the aid of "template oligonu- 
cleotides" (40- to 50-mers) complementary to both the frag- 
ment end as well as a second oligonucleotide attached to a 
solid support, 07) covalently link PCR-generated fragments 
(for example, protein domains orexons) in specific order, and 
(Hi) covalently link two members of a hexamer oligonucleo- 
tide library to form specific dodecamers for directed sequenc- 
ing of cosmids and other large DNAs (30). 
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STRACT The chemical reactivity of thymine (T), when 
mismatched with the bases cytosine, guanine, and thymine, and 
S cXsine (C), when mismatched with thymine, adenine and 
los ne has been examined. Heteroduplex DNAs containing 
Sh mSmatched base pairs were first incubated with osmium 
epoxide (for T and C mismatches) or hydroxylamine (for C 
mSmafches) and then incubated , wit* . piperidme tc ► c eave^he 
DNA at the modified mismatched base. This cleavage was 
studied with an internally labeled strand containing the nus- 
matched T or C, such that DNA cleavage and tl- 
uld be detected by gel electrophoresis. Cleavage at a total of 
T and 21 C mismatches isolated (by at least three properly 
paired bases on both sides) single-base-pair ^matches was 
fdentified. All T or C mismatches stud.ed were deaved^By 
usine end-labeled DNA probes containing T or C single-t>ase 
pTr mismatches and conditions for limited cleavage^ wjm .ere 
able to show that cleavage was at the base pred. cted by 
sequence analysis and that mismatches m a lengtt ofDNA 
could be readily detected by such an approach. This procedure 
mayen^ 

* sense and antisense probes and thus may be used to identify 
ie mutated base and its position in a hetero duplex. 

Definition of the exact single-base change in genes i is a result 
of mutation is an important goal ." 8— ^^ As 
sequencing complete genes to ident.fy base changes is te 
dtois, attempts have been made to improve the effi W o f 
the procedure (1-4). (0 Heteroduplexes formed between 
wild-type and variant DNAs have been treated with ^ the 
single-strand-specific SI nuclease to cleave the DNA at the 
point of the mismatched bases (1). (/0 The d^rent.^ mo- 
bility of native and denatured DNA-DNA heteroduplexes 
;oupled with their differential melting tem P eratu t [ es . h ^f n e " 
exploited by Myers et al. (2). (««0 Since this . method was not 
generally applicable, Myers et al. (3) described a method n 
which mismatches in RNA-DNA heteroduplexes were 
cleaved by RNase A. <iv) An alternative approach tr, i which 
RNase A was used to cleave mismatches m KNAKINa 
n^eroduplexes has also been described (4). (v) Novack et aL 
(5) have reported that single-base-pair mismatches in 
DNA-DNA heteroduplexes react with a carbodumide. 

As these methods did not detect all mutations we have 
examined the chemical reactivity of m 'S^tched bases in 
DNA-DNA heteroduplexes in more detail. We chose the 
steroid 21-hydroxylase (21-OHase) gene because of its med- 
ical importance and because of the large amount of polymor- 
phism^ the gene and pseudogene (6). We have screened 
those reagents, used first in the study of the .secondary 
structure of tRNA (7) and then in DNA sequencing (8), that 
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lead to cleavage of the DNA chain on subsequent reaction 
with piperidine. Two reagents, osmium tetrox.de and hy- 
droxy lamine. were found that potentially can recognize ^all 
variants, as they react with mismatched thymine (T) and 
cytosine (C), respectively. 

MATERIALS AND METHODS 
Preparation of DNA and Probes. P^mid and Mi3 sul ^ne 
DNA were prepared by standard methods (6). Internally 
Sbeted ^N A probes were prepared from M13m P 8 or M13mp9 
ones containing the DNA fragments that were used* 
generate the sequence of the 21-OHase A gene he 21-OHase 
B eene and the mutant 21-OHase B gene (6) (Fig. 1). 
Subclones carrying the desired DN A ^j"^" standard 
mentary to the probe required were labeled by standard 
methods (9) by using the M13 univen>a 
All dNTPs were at a concentration of 0.25 mM except dA lF, 
which was added so that the 1*"^™.™ 
Tvoically 2 ng of primer was annealed with 50 ng of M13 
DNA " 6 X by heating at 90°C for 4 mm followed by 
mc^at^ 

added with 1 Ml of [a-^PldATP (3000 C. mmol: Ci 
GBa- Radiochemical Centre) and 1 m» (7.5 units) of the 
Kfenow fragment of DNA polymerase I (Pharmacy in a finaj 
volume of 17.6 M l, and the mixture was f^ 2 ^ 
for 1 hr All dNTPs were then added at 0.25 mM ana 
incubated 30 min to chase. Samples were then extracted with 
h oS/pheno«, 1:1 (vol/vol), and .he ^ Pje^ 

Symes appropriate for the heteroduplex bWjJJMj* 
Table 1). End-labeled DNA probes (see Table si) were ^derived 
from the appropriate digests of the 3 J* ob ase I frag 
ment of the wild-type or mutant 21-OHase BgeKsor" 
5.5-kilobase*s/II-5«mHIf^ 

cloned in the Pvu II site of the plasmid P A I 153 /^ u "'? 1 £ ; o 
Fragments were purified by electrophoresis in nondenatunng 

"jgSS? Ration. Heteroduplexes conned. unla- 
bel KSfew , ™mid subclones digested with restnc ion 

digested plasmid DNA to labeled probe DNA was usedjor 
infernally 'labeled probes The mixture (20-100 M» was 
heated 5 min at 100'C and annealed 1 hr at «Cj0JJ 
NaCl/3.5 mM MgCl 2 /3 mM Tns-HC , P H 7 7 Het ?"^P ■ 
DNA was precipitated with ethanol and then taken up 

Abbreviation: 21-OHase. steroid 
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Fig. 1. Map of the wild-type 21-OHase B gene (6). The upper line shows the exon-intron structure of the gene, and the lower li„. ■ 
restr.ct.on map. A, Acc\; B, BamHl; E. EcoRl; H, Hinfl; K, Kpn I; M. Afcp I; N, Ato I: P. Pvu II; S, Sst I; Sm La I- Sty Sn t T t ' 
I. The numbers above the restnction map show the approximate positions of differences of the mutant B gene (6) (lower row of -.urn "-o 

5 t^mUST "7 ? n H Umbe ^ S> fr0,r ^ he - Wi,d - typC , B ge " e Wi,h the numberin 8 starti "8 from «"e first difference f™ he 5' eno $ 0 „f 
of those d.fferences .studied are shown. The horizontal bars represent the B gene, the mutant B gene, or the A gene DNA from Mm 1 « ™ 

h aS , S K Pe HV fiCd Tab ' e 1 0r [ fO 7 T A ! plasmid DNA used end-labeling studies (Table 1). The solid boxes represent the DNA used u 



distilled water at 1000 cpm//xl. Approximately 6000 cpm of 
labeled probe DNA was used per assay tube. 

Hydroxylamine Treatment of DNA. Hydroxyiamine hydro- 
chloride (1.39 g) (Analar grade: BDH) was dissolved in 1.6 ml 
of distilled water and the pH was adjusted to 6.0 with 
diethylamine (Fluka). The final volume was =4 ml. giving a 
hydroxylamine concentration of -2.5 M. 

DNA in 6 fx\ of distilled water was treated with 20 ft\ of 
hydroxylamine solution at 37°C for 2 hr or as indicated. The 
reaction was stopped by transferring the mixture to ice and 
adding 200^1 of stop solution containing 0.3 M sodium acetate, 
0.1 mM Na 2 EDTA (pH 5.2) and tRNA (25 /xg/ml: baker's 
yeast, Boehringer Mannheim), and the DNA was precipitated 
with ethanol. After a further ethanol precipitation, the DNA 
pellet was washed once with 70% (vol/vol) ethanol and dried. 

Osmium Tetroxide Treatment of DNA. DNA in 6 fil of 
distilled water was treated with 15 /jlI of 4% (wt/vol) osmium 
tetroxide in water (Aldrich) in a total volume of 24.5 
containing 1 mM EDTA, 10 mM Tris-HC! (pH 7.7), and 1.5% 
(vol/vol) pyridine. Incubation was at 37°C as indicated. The 
reaction was stopped as described for hydroxylamine. 

Piperidine Cleavage. Chemical cleavage of the C and T 
bases that had reacted with hydroxylamine or osmium 
tetroxide was achieved by incubating the heteroduplexes 
with piperidine (8). Piperidine (50^1 at 1 M) was added to the 
dry DNA pellet and incubated at 90°C for 30 min. DNA was 
precipitated with ethanol, washed with 70% (vol/vol) etha- 
nol, and dried. For osmium tetroxide-treated DNA, ethanol 
precipitation after piperidine treatment was in a dry 
ice/methanol bath, and all subsequent operations were at or 
below 4°C until the dried pellet was obtained. 

Electrophoresis of Products. Samples were incubated in 10 
^1 of 60% (vol/vol) formamide/0.1% bromophenol blue/ 
0.1% xylene cyanol FF/35 mM Na^DTA, pH 7.4, at 100°C 
for 4 min before application to denaturing urea gels (8). 
Cleavage and recovery was estimated by measuring radio- 
activity in gel slices and is reported at 2 hr for hydroxylamine 
cleavage and at 30 min for osmium tetroxide. Recovery was 
calculated relative to an unincubated control. 

RESULTS 

Hydroxylamine Cleavage of Mismatched C Bases. Prelimi- 
nary experiments indicated that optimal cleavage of mis- 
matched C bases was obtained with a 2-hr incubation in 2 M 
hydroxylamine at pH 6. Lower concentrations were not as 
effective, and longer times led to the destruction of the DNA. 



The cleavage of C C, CT, and C-A mismatches [mutations 
B8, B4, and Bll. respectively (6)] as a function of time is 
shown in Fig. 2. Cleavage at 2 hr was 93%, 88%. and 74% with 
recoveries of 65%, 71%, and 23%, respectively (recoveries of 
70% were later consistently achieved by use of methci-ol/dry 
ice for ethanol precipitation). The 215-base fragment in Fig. 
2B is due to cleavage of a OA mismatch (mutation B3) that 
lies 20 bases from the C*T mismatch. Cleavage at this C*A 
mismatch was not quantitated. In all cases the size of the 
cleavage products was consistent with cleavage at the re- 
spective mismatches. Controls— heteroduplexes with no in- 
cubation, with no hydroxylamine added, or with no piperi- 
dine added, a homoduplex with the same labeled strand, and 
a heteroduplex with the opposite strand labeled—showed no 
specific cleavage of the labeled strand (Fig. 2). For mutation 
Bll (C*A mismatch), the probe included 10 base:? :i the 
vector and the size heterogeneity seen in Fig. 2C ilane 5) is 
due to cleavage of those unpaired bases. 

The above results are consistent with cleavage at the 
position of the mismatch identified by sequence analysis (6) 
but do not prove that it is at this point. To determine the exact 
position of cleavage the 3' end of probe VI was end-labeled 
by using the Klenow fragment of DNA polymerase I. A 
portion of the probe preparation was sequenced by the 
Maxam-Giibert method (8). Another portion of the prepara- 
tion was used to make heteroduplex that was then incubated 
with hydroxylamine for 2 hr. The product of the cleavage 
reaction is exactly adjacent to the mismatched C (Fig. 3, lane 
5). Two minor products, 1 and 3 bases from the mismatched 
C, are also apparent presumably due to propagation (10) 
where paired C bases near the mismatch show some reac- 
tivity with hydroxylamine, similar to unmatched bases near 
the loops of tRNA. 

The ability of hydroxylamine and piperidine to cleave C 
mismatches in various contexts is summarized in Table 1. 
Cleavages of 90%, 84%, and 87% were observed for a GC 
mismatch (mutation B5) and two OA mismatches (mutations 
Bl and A82a), respectively. In other cases only the ability to 
cleave was recorded. 

Screening of a larger number of mismatches for cleavage 
was possible in the probe IV/ V region (see Fig. 1) making use 
of the large number of differences between the 21-OHase A 
and B genes in this area. An end-labeled probe was used to 
facilitate the positioning of cleavage and partial cleavage with 
hydroxylamine was used to increase the yield of the various 
bands expected. One such experiment is illustrated in Fig. 3 
(lane 6) with a probe from region VI that hybridized to the 
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2. Hydroxylamine reaction with an internally labeled 
pr -e " M) C-C mismatch (mutation B8). (5) CA and CT mismatches 
(mutations B4 and B3, respectively). (Q OA mismatch (mutation 
BID. All incubations were at 37°C with 2 M hydroxylamine unless 
otherwise indicated. (A) Heteroduplexes (1.2 fig) with C-C as the 
only mismatch were incubated for 0 (lane 5), 30 (lane 6), 60 (lane 7), 
or 120 (lane 8) min. Controls were homoduplexes (1.2 fig) with the 
same labeled strand but with an unlabeled wild-type DN A incubated 
120 min (lane 2), heteroduplexes (1.2 jug) incubated for 120 mm 
without the subsequent addition of piperidine (lane 3), and hetero- 
duplexes (1.2 fig) incubated without hydroxylamine for 120 min (lane 
4). Lane 1 contains molecular size markers. Size markers in bases are 
c he left; sizes of fragments and original probes are on the right. (B) 
eroduplexes (2 fig) containing OA and CT as the only mis- 
matches were incubated for 0 (lane 3), 30 (lane 6), 60 (lane 7), and 120 
(lane 8) min. The band at 195 bases represents cleavage at the C*T 
mismatch, and the band at 215 bases represents cleavage at the C-A 
mismatch only. Controls were the complementary heteroduplexes (2 
fig) with a G-T mismatch (in which the 5' mutant strand is labeled and 
the wild-type DN A is unlabeled) incubated for 0 (lane 1) and 120 (lane 
2) min, heteroduplexes (2.0 fig) incubated for 120 min without 
hydroxylamine (lane 4), and heteroduplexes (2.0 fig) incubated for 
120 min without subsequent addition of piperidine (lane 5). Lane 9 
contains molecular size markers. (O Heteroduplexes (0.3 fig) con- 
mining OA as the only mismatch (except for ragged ends from a small 
;gion of the M13 vector, see Table 1) were incubated for 0 (lane 4), 
^0 (lane 5), 60 (lane 6), or 120 (lane 7) min. Controls were 
homoduplexes (0.3 fig) with the same labeled strand but with an 
unlabeled wild-type DNA (Table 1) incubated for 120 min (lane 1), 
heteroduplexes (0.3 fig) incubated for 120 min without subsequent 
addition of piperidine (lane 2), and heteroduplexes (0.3 fig) incubated 
for 120 min without hydroxylamine (lane 3). Lane 8 contains 
molecular size markers. For B and C, numbers on the right refer to 
size markers in bases and numbers on the left are the size of 
fragments and original probes. 

unlabeled 21-OHase A gene DNA. Labeling was at the 3' end 
of the sense strand. After partial (30 min) reaction with 
hydroxylamine only the expected 246-, 147-, 90-, 62-, and 
36-base fragments are seen. This is consistent with cleavage 
at the five expected C mismatches: C-A, CT, CT, C-C, and 
C-C (mutations A24, A35, A45, A48, and A50, respectively). 
One mutation (A48) did not produce an isolated C mismatch 
due to the presence of a T-C mismatch immediately adjacent. 

Use of a probe from region X (Fig. 1) potentially allows the 
study of 8 C mismatches and 1 unpaired C base, whereas use 
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Fig. 3. Analysis of the position of cleavage of a heteroduplex by 
hydroxylamine with end-labeled DNA. Lanes 1-4 show a Maxam- 
Gilbert sequencing ladder of the end-labeled probe (region VI) (G, A, 
T, and C, respectively). Lane 5 shows the same labeled DNA after 
formation of a heteroduplex with a 12 times excess of unlabeled 
mutant DNA (Table 1) and incubation with 2 M hydroxylamine for 
2 hr at 37°C. Lane 6 shows the same labeled DNA after formation of 
a heteroduplex with a 12 times excess of unlabeled A gene DNA and 
incubation with 2 M hydroxylamine for 30 min at 37°C. Lane 7 shows 
the same labeled DNA after formation of a homoduplex with a 12 
times excess of unlabeled wild-type DNA and incubation with 2 M 
hydroxylamine for 2 hr at 37°C. To the right are shown the sequences 
around two of the cleavage points (arrows) where the sequence is 
readable. Numbers on the left are the sizes of fragments (and probe) 
produced by the heteroduplex shown in lane 6. Numbers on the right 
are the sizes of markers (contained in lane 8). The amount of DNA 
in lanes 5-7 was 2.2 fig. 

of a probe from region XI (Fig. 1) potentially allows the study 
of 11 C mismatches and 1 unpaired C base in a loop. Table 1 
shows those cleavages where neighboring mismatches are >3 
bases away. With the region X and the region XI probes in the 
regions able to be assessed, all C mismatches or unpaired C 
bases were cleaved. Except for those C bases near the C 
mismatches that showed lesser cleavage, presumably due to 
propagation, no unexpected cleavages were found. 

Osmium Tetroxide Cleavage of Mismatched T Bases. The 
cleavage of T-G, T-C, and T-T mismatches (mutations B3, B4, 
and A64, respectively) are shown with increasing time in Fig. 
4. Cleavage was 61%, 78%, and 17% with recoveries of 33%, 
30%, and 21%, respectively. Heteroduplex controls without 
incubation, osmium tetroxide, or piperidine, or a homodu- 
plex control showed no specific cleavage. In all cases the size 
of the cleavage products was consistent with cleavage at the 
respective mismatch site. 

Table 1 summarizes the results obtained with a further 
three T mismatches studied with an internally labeled probe. 
Substantial cleavage of the probes used was observed for T-G 
(mutation BlOa) and for T-C (mutation A65) mismatches. 
Cleavage of T mismatches that were not quantitated were 
studied with end-labeled probes (see below). Cleavage at T-G 
and T-C mismatches (mutations A17, A23, A29, and A30) is 
shown in Fig. 5 and Table 1. T-G mismatches (mutations A46, 
A84, and A88) and a T-T mismatch (mutation A31) were 
cleaved in other end-labeled probes (Table 1). 

To determine the exact position of cleavage, products of 
the Maxam-Gilbert sequencing reactions of the end-labeled 
probe was electrophoresed next to the heteroduplex that had 
reacted with osmium tetroxide (Fig. 5, lanes 1-6). It can be 
seen that the two isolated T mismatches (A29 and A30) are 
cleaved at the position of the mismatch. . 

Fig 5 also shows the use of end-labeled probe in a 
heteroduplex with unlabeled DNA suspected of containing 
sequence changes; partial cleavage is a convenient method 
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Table 1. Summary of C and T mismatches cleaved 

w . . Sequences at 

Mismatch Mutation Probe* Cleavage* mismatch^ 
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C/A 

C/T 

C/C 

C/A 

C/C 

C/A 

C/A 

C/A 

C/A 

C/A 

C/T 

C/T 

C/C 

C/C 

C/T 

C/A 

C/T 

C/T 

C/A 

C/A 

C/A 

C/A 

C/T 

T/T 

T/C 

T/G 

T/G 

C/A 

T/C 

C/C 

T/G 

T/C 

T/C 

T/G 

T/T 

T/G 

T/G 

T/G 

C/A 



B3-, 

B4 J 

B8 

Bll 

B5 

A13, 
A14 J 
Bl , 
A82a 
A24 

A35 

A45 

A48 

A50 

A43 

A44 

A47 

A23 

A25 

A26 

A27 

A28 

A29 

A64 
B4 

B3 , 

B10a\ 

Bll J 

A65t 

A67' 

A30 

A29 

A23 

A17 

A31 

A46 

A88 

A84 

A89 



57III/M 

37VII/B 

57VIII/B 

57IV/B 

57II/B 

37I/M 

57IX/A 

5 7VI/B 

57VI/B 

57vr/B 

57VI/B 

57VI/8 

37X/B 

3 7X/B 

37X/B 

37XI/A 

37XI/A 

37XI/A 

37XI/A 

3 7XI/A 

3 7XI/A 

57XII/B 
37III/B 
57HI/8 

57VIII/B 

37XIII/A 

5 7VI/B 
57VI/B 
57VI/B 
57VI/B 
3 7X/B 
3 7X/B 
5 7XIV/B 
57XIV/B 
57XIV/B 



+ve(+ve) 

88(57) 

93(79) 

74(+ve) 

90 

+ve 

+ve 

84(81} 

87 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

17 
78 
61 
46 

(+ve) 
57 

(+ve) 
E 
E 
E 
E 
E 
E 
E 
E 

(E) 



TGCAC(C)TGCTG 
CTCCC(C)ATCTA 
CCATG(C)TCGGC 
CTCGG(C)AGTCA 
CCCCA(C)CTCCT 
CGTCT(C)GCCAT 
CCTCC OGCCTC 
CCTCC(C)TTCAC 
GCTCC(C)GTACG 
CTATG(C)TGCCC 
AGGTC(C)CTGGA 
GAGGC(C)GAAGA 
GCCTT(C)ATCAG 
CCCCA(C)CTCCT 
CCAAC(C)CCTGC 
CCTCC (C)CAACC 
GAAGG(C)AGCTS 
CAAGA(C)CCCAT 
GAATT(C)AAGAC 
GAGAC(C)AGGAA 
GATCA(C)TTGAG 
GAGGC(C)GAGGT 
GGCTC(C)CACTT 

CATCA(T)CTGTT 
TAGAT(T)GGGAG 
7GCAC ( T ) TGCTG 
GCTCC(T)GTACG 
CTCGG(C)AGTCA 
AAGGA(T)GGAGT 
TTGAC(C)TCCTG 
ACCTT(T)GGGGC 
AAGTG(T)GAGCC 
ATGGG(T)TCTTG 
AGGGC(T)GGGGG 
GGGGA(T)GCCCC 
GCAGC(T)GAGGG 
77AAT(T)CTGAG 
TGCTC(T)TCCCG 
GCTGG(C)CCTTT 



*Entry gives the probe sense, area of the gene used, and the gene 
used (see Fig. 1). 

'Cleavage is defined in Materials and Methods. + ve indicates 
cleavage seen but not quantitated. E denotes that end-labeled probe 
was used and cleavage was seen but not quantitated. Value in 
parentheses refers to cleavage of C mismatches with osmium 
tetroxide. 

^Mismatched bases that are cleaved are in parentheses and nearby 
mismatched bases are underlined. 
^Mutations were described after publication of ref. 6. 

for detecting these differences. After a short incubation with 
osmium tetroxide (lane 5, 1 min; lane 6, 5 min) and subse- 
quent cleavage of the heteroduplex with piperidine, a number 
of bands not seen in the homoduplex control treated in the 
same way (lane 8) are apparent. Consideration of the se- 
quencing tracks (lanes 1-4), the molecular size markers (lane 
9), and the sequence allows assignment of the bands to 
specific T bases. The five single-base-pair T mismatches are 
indicated by mutation name and lead to five of the six major 
bands seen in lane 5. The sixth major band (second from top) 
results from the cleavage of a T next to a loop in the 21-OHase 
g e " e <toe to a 4-base insertion. The next strongest bands, 
two below mutation A30, are due to cleavage of T mis- 
matches next to a single-base-pair mismatch (mutation A31) 
or 3 bases from a 3-base insert. Three further examples of the 
former are seen in the three faint bands above mutation A29 
The minor band below mutation A23 (lanes 5 and 6) and at 
the second hydroxylamine cleavage of C from the bottom (lane 
7) is consistent with slower rate of reaction of osmium 
Sow) m >smatches relative to T mismatches (see 

The hydroxylamine cleavage of the same heteroduplex 
^"f V W» ? ustrat «d in Fig. 3, lane 6) illustrates how a 
stretch of DNA can be scanned for all T and C mismatches 
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/ , , -r o • 0sm,um tetr <>xide reaction with internally labeled probe 
^ m,Smatch (mutation B3 >- T-C mismatch (mutation B4)' 
(C) T-T mismatch (mutation A64). AH incubations were at *"r 
2 4% (wt/vol) osmium tetroxide at 37°C unless otherwise indicated 
. Heteroduplexes (0.08 fig) containing T*G and A<3 as the only 
mismatches were incubated for 0 (lane 4). 30 (lane 5). 60 (lane 6), or 
120 (lane 7) mm. Controls were homoduplexes (0.08 jug) with the 
same labeled strand but with unlabeled wild-type DNA incubated for 
1.0 mm with (lane 2) or heteroduplexes without (lane 3) osmium 
tetroxide. (B) Heteroduplexes (0.35 M g) containing T-C and A-C as 
the only mismatches were incubated for 0 (lane 5), 15 (lane 6), 30 (lane 
7), or 60 (lane 8) min. Controls were homoduplexes (0.35 fig) with the 
same labeled strand but with unlabeled wild-tvpe DNA incubated for 
30 mm (lane 2). heteroduplexes (0.35 fig) incubated for <0 min 
without the subsequent addition of piperidine (lane 3). anc -etero- 
duplexes (0.35 fig) incubated at 37°C for 60 min without jsmium 
tetroxide (lane 4). (C) Heteroduplexes (0. 19 fig) containing TT as the 
only mismatch were incubated for 0 (lane 5), 15 (lane 6). 30 (lane 7), 
or 60 (lane 8) min. Controls, incubated for 60 min, were homodu- 
plexes (0.1 Vg) with the same labeled strand but unlabeled wild-type 
DNA (lane 2), heteroduplexes (0.19 fig) incubated without the 
subsequent addition of piperidine (lane 3). and heteroduplexes (0.19 
fig) incubated without osmium tetroxide (lane 4). In B and C only half 
the DNA was loaded onto lanes 3 and 4. For A-C, numbers on the 
left refer to the size of the marker fragments (lanes 1) in bases and 
numbers on the right refer to the size of the fragments and orieinal 
probe in bases. 

Osmium Tetroxide Cleavage of Mismatched C Bases. The C 
mismatches studied for cleavage by hydroxylamine and 
piperidine were also studied for cleavage by osmium tetrox- 
ide and piperidine (Table 1). By using internally labeled . 
probes containing CT, C*C, and OA mismatches (mutations ; 
B4, B8, and Bl), cleavages of 57%, 78%, and 81%, respec- * 
tively, were found. A further two OA mismatches (mutations 
B3 and Bll) were also cleaved, but the values were not 
quantifiable. The rate of cleavage of the C mismatches by 
osmium tetroxide was slower than cleavage of T mismatches 
(data not shown). 

DISCUSSION 

We have screened a variety of reagents for their ability to ; 
react with purine or pyrimidine bases when they are mis- ' 
matched in a duplex in such a way that the probe containing 
the mismatched bases is cleaved at that point by piperidine. 
Such reagents included hydrazine, potassium permanganate, 
formic acid, sodium hydroxide, diethyl pyrocarbonate, meuV 
ylene blue, hydroxylamine, and osmium tetroxide and have ' 
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Fig 5 Analysis of the position of cleavage of the heteroduplex 
shown in Fig. 3 by osmium tetroxide as well as hydroxylamine with 
end- -eled DNA. Lanes 1-4 show a Maxam-Cilbert sequencing 
lade of the end-labeled probe (region VI) (G, A. T, and C, 
respectively). Lanes 5 and 6 show the same labeled DNA after 
formation of a heteroduplex with a 12 times excess of unlabeled A 
gene DNA (Table 1) and incubation with 2.4% (wt/vol) osmium 
tetroxide for 1 and 5 min. respectively, at 37°C Lane 7 shows the 
same DNA heteroduplex treated with 2 M hydroxylamine for 10 mm 
at 37°C. Lane 8 is a homoduplex control with the same labeled strand 
but with wild-type unlabeled DNA treated with 2.4% (wt/vol) 
osmium tetroxide for 5 min at 37°C. To the left are shown the 
sequences around two of the cleavage points (arrow), where the 
sequence is readable. Letters and numbers on the left are the 
mm: : on numbers represented by the cleavages in lanes 5 and 6, and 
nur. ;rs on the right are the size in bases of the markers (lane 9). 
The ..mount of DNA in lanes 5-8 was 2.6 

been used for structural studies of tRNA (7, 10), for sequenc- 
ing nucleic acids (8), and for Z-DNA studies (11). In initial 
experiments, hydroxylamine (12-14) and osmium tetroxide 
(12), showed considerable promise, and conditions were 
established for maximal cleavage of mismatched C and T, 
respectively (data not shown). We applied these conditions to 
a large number of T and C mismatches and showed that all 13 
T mismatches studied were cleaved, including 2 T-T, 4 T-C, 
and * T-G mismatches. All 21 C mismatches studied were also 
cleaved, including 2 OC, 7 OT, and 12 OA mismatches. At 
least one example of each C mismatch was cleaved with 
osmium tetroxide at a slower rate consistent with earlier 
studies (15). Previous work on tRNA with osmium tetroxide 
(16) and O-methylhydroxylamine (17), a compound related to 
hydroxylamine, allowed us to predict that unmatched C or T 
would be reactive, and this was found in three cases of 
unmatched C bases (data not shown). Thus all types of 
mutations (i.e., insertions, deletions, and base changes) can 
be ietected by the procedure described here. 

:he use of end-labeled probes (Figs. 3 and 5) allowed us (i) 
to confirm that, for selected cases, the point of cleavage was 
at the point predicted; (/i) to collect further examples of C or 
T mismatch cleavages; and (///) to use the above findings to 
test procedures to detect mismatches, and hence mutations 
or polymorphisms, after wild- type (or reference) DNA had 
been annealed to variant DNA. 
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The method for detection of mismatched bases described 
in this paper as applied to cloned DNA can be contrasted with 
two other methods, the RNase method (3, 4) and the 
carbodiimide method (5). The RNase method (3, 4) needs an 
extra step of cloning (into the SP6 vector) beyond that needed 
for the carbodiimide method (5) or the method described 
here. However, the greatest drawback of the RNase method 
appears to be the variable cleavage of some mismatches from 
0 of 6 G-C mismatches to 1 of 14 G-T and 1 of 7 GA 
mismatches to excellent cleavage of all 22 OA mismatches 
(3). The study of mismatches with the end-labeled probe is 
theoretically possible with the RNase method but has not yet 
been reported. The carbodiimide method requires the heter- 
oduplex first to be made blunt-ended, but its potential for 
detecting mismatches is unclear; the results for only two 
mismatches (T-C and G*T) were given, although positive 
results for GG and T-T mismatches were mentioned but not 
shown. Because this method is not a cleavage method, 
fragments cannot be detected by PAGE. In contrast, the 
strengths of our method appear to be (/) that no extra cloning 
is required beyond that for cloning and sequencing the 
wild-type (reference) DNA, (it) that it is a cleavage method 
that allows easy assessment, (///) that because it is a chemical 
method, it may be more reproducible than enzymatic meth- 
ods, (iv) that comparison with a Maxam-Gilbert sequencing 
ladder of limited cleavage of a heteroduplex allows rapid and 
ready identification of position and type of mismatch, and (v) 
that as no mismatches have yet been found that do not cleave, 
it is possible that all mismatches may be detectable. Thus if 
a labeled probe contains a mismatched T or C in its hetero- 
duplex. it can be readily detected, although a mismatched A 
or G will not. However, a probe of the opposite sense will 
contain a mismatched T or C, respectively, and the mismatch 
then will be detected. . 

Our method should be applicable to genomic DNA, par- 
ticularly in the analysis of the defective genes associated with 
inherited diseases and in the study of oncogenes that differ by 
a few bases. In addition our procedure may be used to 
compare related virus isolates and may provide a convenient 
and rapid check of in v/m>-mutagenized DNA fragments. 
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There is great heterogeneity in the way indi- 
viduals respond to medications, in terms of 
both host toxicity and treatment efficacy Po- 
. tent.al causes for such variability in dru» 
effects include the pathogenesis and severity 
of the disease being treated; drug interac- 
ons; and the individual's age, nutritional 
status, renal and liver function, and concom- 
itant illnesses. Despite the potential impor- 
tance of these clinical variables in determin- 
ing drug effects, it is now recognized that 
inherited d.fferences in the metabolism and 
disposition of drugs, and genetic polymor- 
phisms m the targets of drug therapy (such as 
receptors), can have an even greater influence 
on the efficacy and toxicity of medications 
Clinical observations of such inherited differ- 

.X lo^n™ 8 6ffeCtS Were first d <*™ented 
in the 1950s, exemplified by the relation be- 
tween prolonged muscle relaxation after 
suxamethonium and an inherited deficiency 
of plasma cholinesterase (/), hemolysis after 
antimalarial therapy and the inherited level of 
erythrocyte glucose 6-phosphate dehydroge- 
nase activity (2), and peripheral neuropathy 
of ison.az.d and inherited differences in acet- 
yl*™ of this medication (J). Such observa- 
tions gave rise to the field of "pharmacoge- 

SoS ?' Chf0CUSeS lar ^ »" genetic 
polymorphisms , n drug-metabolizing en- 
zymes and how this translates into inherited 
differences m drug effects [reviewed in (4)} 
The molecular genetic basis for these in- 

1 SS? T ^ 8an t0 be e,Ucidated in ,he Ia * 
1 1 980s, with the initial cloning and character- 

zatton of a polymorphic human gene encod- 
iuin hv„ 8 ", metab0,izins enz > me deb ™°- 
considered functionally "polymorphic" when 
allelic vanants exist stably in the population 
one or more of which alters the activity 0 f °he 
encoded protein in relation to the wud-type 



sequence. In many cases, the genetic poly- 
morphism is associated with reduced activity 
of the encoded protein, but there are also 
examples where the allelic variant encodes 
proteins with enhanced activity. Since the 
cloning and characterization of CYP2D6 hu 
man genes involved in many such pharm'aco- 
genetic traits have been isolated, their molec- 
ular mechanisms have been elucidated and 
their clinical importance has been more dear- 
ly defined. Inherited differences in drug-me- 
tabohzmg capacity are generally monogenic 
traits, and their influence on the pharmacoki- 



netics and pharmacologic effects of medica- 
tions ,s determined by their importance for 
the activation or inactivation of drug sub- 
sfrates. The effects can be profound toxicity 
for medications that have a narrow therapeu- 
tic index and are inactivated by a polymor- 
phic enzyme (for example, mercaptojLne 
azathiopnne, thioguanine, and fluorouracil 
6) or reduced efficacy of medications tha 
require activation by an enzyme exhibiting 
genetic polymorphism (such as codeine) (7) 
However, the overall pharmacologic ef- 
fects of medications are typically not mono- 
genic traits; rather, they are determined by the 
interplay of several genes encoding proteins 
involved in multiple pathways of drug metab- 
olism, deposition, and effects. The potential 
Polygenic nature of drug response is illustrat- 
ed m F.g. 1, which depicts the hypothetical 
effects of two polymorphic genes: one that 
determines the extent of drug inactivation and 



Fig. 1. Polygenic deter- 
minants of drug effects. 
The potential conse- 
quences of administer- 
ing the same dose of a 
medication to individu- 
als with different drug- 
metabolism genotypes 
and different drug-re- 
ceptor genotypes is il- 
lustrated. Active drug 
concentrations in sys- 
temic circulation are 
determined by the indi- 
vidual's drug-metabo- 
lism genotype (green 
lettering), with (A) ho- 
mozygous wild type 
(wt/wt) patients con- 
verting 70% of a dose 
to the inactive metab- 
olite, leaving 30% to 
exert an effect on the 
target receptor. (B) For 
the patient with het- 
erozygous (wt/m) drug- 
metabolism genotype, 
35% is inactivated, 
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another that determines the sensitivity of the 
drug receptor. The polymorphic drug-metab- 
olizing enzyme, which exhibits codominant 
inheritance (that is, three phenotypes), deter- 
mines the plasma concentrations to which 
each individual is exposed, whereas the poly- 
morphic receptor determines the nature of 
response at any given drug concentration. 
This example assumes that drug toxicity (Fig. 
1, red lines) is determined by nonspecific 
effects or through receptors that do not ex- 
hibit functionally important genetic polymor- 
phisms, although clearly toxicity can also be 
determined by genetic polymorphisms in 
drug receptors. Thus, the individual with ho- 
mozygous wild-type drug-metabolizing en- 
zymes and drug receptors (Fig. 1A) would 
have a high probability of therapeutic effica- 
cy and a low probability of toxicity, in con- 
trast to an individual with homozygous mu- 
tant genotypes for the drug-metabolizing en- 
zyme and the drug receptor, in which the 
likelihood of efficacy is low and that of tox- 
icity is high (Fig. 1C). 

Such polygenic traits are more difficult to 
elucidate in clinical studies, especially when 
a medication's metabolic fate and mecha- 
nisms of action are poorly defined. However, 
biomedical research is rapidly defining the 
molecular mechanisms of pharmacologic ef- 
fects, genetic determinants of disease patho- 
genesis, and functionally important polymor- 
phisms in genes that govern drug metabolism 
and disposition. Moreover, the Human Ge- 
nome Project, coupled with functional genom- 



Phasel 



DPD 



ALDH 



epoxide 
hydrolase 




r CYP1A1/2 
CYP1B1 

CYP2A6 



- CYP2B6 
/ CYP2C8 

0mm 

•CYl^G&t! 



ics and high-throughput screening methods, is 
providing powerful new tools for elucidating 
polygenic components of human health and 
disease. This has spawned the field of "phar- 
macogenomics", which aims to capitalize on 
these insights to discover new therapeutic tar- 
gets and interventions and to elucidate the con- 
stellation of genes that determine the efficacy 
and toxicity of specific medications. In this 
context, pharmacogenomics refers to the entire 
spectrum of genes that determine drug behavior 
and sensitivity, whereas pharmacogenetics is 
often used to define the more narrow spectrum 
of inherited differences in drug metabolism and 
disposition, although this distinction is arbitrary 
and the two terms are now commonly used 
interchangeably. Ultimately, knowledge of the 
genetic basis for drug disposition and response 
should make it possible to select many medica- 
tions and their dosages on the basis of each 
patient's inherited ability to metabolize, elimi- 
nate, and respond to specific drugs. Herein, we 
provide examples that illustrate the current sta- 
tus of such pharmacogenomic research and dis- 
cuss the prospects for near-term advances in 
this field. 

Genetic Polymorphisms in Drug 
Metabolism and Disposition 

Until recently, clinically important genetic 
polymorphisms in drug metabolism and dis- 
position were typically discovered on the ba- 
sis of phenotypic differences among individ- 
uals in the population (#), but the framework 
for discovery of pharmacogenetic traits is 
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Fig. 2. Most drug-metabolizing enzymes exhibit clinically relevant genetic polymorphisms. Essen- 
tially all of the major human enzymes responsible for modification of functional groups [classified 
as phase I reactions (left)] or conjugation with endogenous substituents [classified as phase II 
reactions (right)] exhibit common polymorphisms at the genomic level; those enzyme polymor- 
phisms that have already been associated with changes in drug effects are separated from the 
corresponding pie charts. The percentage of phase I and phase II metabolism of drugs that each 
enzyme contributes is estimated by the relative size of each section of the corresponding chart. 
ADH, alcohol dehydrogenase; ALDH, aldehyde dehydrogenase; CYP. cytochrome P450; DPD, 
dihydropyrimtdine dehydrogenase; NQ01. NADPH:quinone oxidoreductase or DT diaphorase; 
COMT, catechol O-methyltransferase; GST, glutathione S-transferase; HMT, histamine methyl- 
transferase; NAT, N-acetyltransferase; STs, sulfotransferases; TPMT, thiopurine methyltransferase; 
UGTs. uridine S'-triphosphate gtucuronosyltransferases. 



rapidly changing. With recent advances in 
molecular sequencing technology, gene poly- 
morphisms [such as single-nucleotidc poly- 
morphisms (SNPs), and especially SNPs that 
occur in gene regulatory or coding regions 
(cSNPs)] may be the initiating discoveries, 
followed by biochemical and, ultimately, 
clinical studies to assess whether these 
genomic polymorphisms have phenotypic 
consequences in patients. This latter frame- 
work may permit the elucidation of polymor- 
phisms in drug-metabolizing enzymes that 
have more subtle, yet clinically important 
consequences for interindividual variability 
in drug response. Such polymorphisms may 
or may not have clear clinical importance for 
affected medications, depending on the mo- 
lecular basis of the polymorphism, the ex- 
pression of other drug-metabolizing enzymes 
in the patient, the presence of concurrent 
medications or illnesses, and other polygenic 
clinical features that impact upon drug re- 
sponse. In Fig. 2, we have highlighted those 
drug-metabolizing enzymes known to exhibit 
genetic polymorphisms with incontrovertible 
clinical consequences; however, almost every 
gene involved in drug metabolism is subject 
to common genetic polymorphisms that may 
contribute to interindividual variability in 
drug response. Table 1 provides examples of 
how these genetic polymorphisms can trans- 
late into clinically relevant inherited differ- 
ences in drug disposition and effects, a com- 
prehensive summary of which is available at 
www.sciencemag.org/feature/data/1044449. 
shl. 

All pharmacogenetic polymorphisms stud- 
ied to date differ in frequency among ethnic and 
racial groups. In fact, the slow acetylator phe- 
notype was originally suspected to be geneti- 
cally determined because of the difference in 
frequency of isoniazid-induced neuropathies 
observed in Japan versus those observed in the 
United States (9). The marked racial and ethnic 
diversity in the frequency of functional poly- 
morphisms in drug- and xenobiotic-metaboliz- 
ing enzymes dictates that race be considered in 
studies aimed at discovering whether specific 
genotypes or phenotypes are associated with 
disease risk or drug toxicity. 

It is now well recognized that adverse drug 
reactions may be caused by specific drug-me- 
tabolizer phenotypes. This is illustrated by the 
severe and potentially fatal hematopoietic tox- 
icity that occurs when thiopurine methyltrans- 
ferase- deficient patients are treated with stan- 
dard does of azathioprine or mercaptopurine 
(6). Another example is the slow acetylator 
phenotype that has been associated with hydral- 
azine-induced lupus, isoniazid-induced neurop- 
athies, dye-associated bladder cancer, and sul- 
fonamide-induccd hypersensitivity reactions (9, 
10)\ in all cases, acctylation of a parent drug or 
an active metabolite is an inactivating pathway. 
.V-Accivltninsfcniso is an enzyme that conju- 
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case, in the metabolism and disposition of med- 
ications, and in the targets of drug therapy. 
Such diagnostics, which need be performed 
only once for each battery of genes tested, can 
then become the blueprint for individualizing 
drug therapy. This is illustrated in Fig. 3, which 
depicts various genes that could be genotyped 
to guide the selection and dosing of chemother- 
apy for a patient with acute lymphoblastic leu- 
kemia (ALL). It is already known that genetic 
polymorphisms in drug-metabolizing enzymes 
can have a profound effect on toxicity and 
efficacy of medications used to treat ALL (6) 
and that individualizing drug dosages can im- 
prove clinical outcome (30). It has also been 
established that the genotype of leukemic lym- 
phoblasts is an important prognostic variable 
that can be used to guide the intensity of treat- 
ment (31). Furthermore, genetic polymor- 
phisms are also known to exist for cytokines 
and other determinants of host susceptibility to 
pathogens, and polymorphisms in cardiovascu- 
lar, endocrine, and other receptors may be im- 
portant determinants of an individual's suscep- 
tibility to drug toxicity. Putting all of these 
molecular diagnostics on an "ALL chip" would 
provide the basis for rapidly and objectively 
selecting therapy for each patient. These exam- 
ples represent our current, relatively poor, un- 
derstanding of genetic determinants of leuke- 
mia therapy and host sensitivity to treatment; 
ongoing studies will provide important insights 
that should substantially enhance the utility of 
such pharmacogenomic strategies for ALL and 
many other human illnesses. 
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Mismatch repair detection (MRD) is an in vivo method that uses a change in bacterial colony color to detect 
DNA sequence variation. DNA fragments to be screened for variation are cloned into two MRD plasmids, 
and bacteria are transformed with heteroduplexes of these constructs. The resulting colonies are blue in the 
absence of a mismatch and white in the presence of a mismatch. MRD is capable of detecting a single 
mismatch in a DNA fragment as large as 10 kb in size. In addition, MRD has the potential for analyzing many 
fragments simultaneously, offering a powerful method for high-throughput genotyping and mutation 
detection in a large genomic region. 



The detection of mutations in genomic DNA 
plays a critical role in efforts to elucidate the ge- 
netic basis of human disease. Although many ap- 
proaches are currently applied to the problem of 
mutation detection, no single technique pro- 
vides a rapid method for screening large stretches 
of genomic DNA with high sensitivity and spec- 
ificity (Grompe 1993). We have developed an in 
vivo bacterial assay, mismatch repair detection 
(MRD), that utilizes the Escherichia coli methyl- 
directed mismatch repair system to detect single- 
base mismatches in DNA. Unlike other DNA vari- 
ation detection techniques, MRD can detect a 
single-base mismatch in up to 10 kb of DNA. In 
addition, MRD has the potential to examine 
many different DNA fragments simultaneously, 
providing a rapid method for screening large 
stretches of DNA for nucleotide sequence varia- 
tion. 

The normal function of the E. coli methyl- 
directed mismatch repair system is to correct er- 
rors in newly synthesized DNA resulting from im- 
perfect DNA replication (Wagner and Meselson 
1976). The system distinguishes unreplicated 
from newly replicated DNA by taking advantage 
of the fact that methylation of adenine in the 
sequence GATC occurs in unreplicated DNA but 
not in newly replicated DNA. Mismatch repair is 
initiated by the action of three proteins, MutS, 
MutL, and MutH, which lead to nicking of the 
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unmethylated, newly replicated DNA strand at a 
hemimethylated GATC site. The unmethylated 
DNA strand is then digested and resynthesized in 
a replication reaction in which the methylated 
strand is used as a template (Modrich 1991). The 
methyl-directed mismatch repair system can re- 
pair single-base mismatches and loops up to 3 
nucleotides in length. Loops of 5 nucleotides and 
larger are not repaired (Parker and Marinus 
1992). We have taken advantage of the inability 
of the mismatch repair system to repair loops of 5 
nucleotides or greater to design two vectors that 
allow in vivo mismatch repair to be detected vi- 
sually as a change in bacterial colony color. 

Two pUC-derived plasmids, the blue 
(pMF200) and the white (pMFlOO) plasmid, are 
used in the MRD procedure. These plasmids are 
identical except for a 5-bp insertion into the 
lacZa gene of pMFlOO (Fig. 1). This insertion re- 
sults in white colonies when bacteria trans- 
formed with the plasmid are grown on LB plates 
supplemented with indolyl-P-D-galactoside (X- 
gal) and isopropyl-p-D-thiogalactoside (IPTG). In 
contrast, bacteria transformed with the blue plas- 
mid result in blue colonies when grown under 
these conditions. The initial step of the MRD pro- 
cedure (Fig. 2) consists of cloning one of two 
DNA fragments to be screened for differences 
into the blue plasmid and cloning of the second 
DNA fragment into the white plasmid. The blue 
plasmid construct is then transformed into a 
dam" bacterial strain, resulting in a completely 
unmethylated plasmid, whereas the white plas- 
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Figure 1 The MRD vectors. pMFl 00 and pMF200 
are derived from pUC19 with the multiple cloning 
site displaced from the lacZa region (Yanisch-Perron 
et al. 1985). In addition, the MRD vectors contain 
the Bgf\ fragment (21 66-472) and most of the mul- 
tiple cloning site of pBluescript (Short et al. 1 988). 
The multiple cloning sites of the MRD vectors do 
not have sites for the restriction enzymes Xba\, Spe\, 
BamH\, Sma\, and Apa\; the EcoRI site is not unique. 
pUCI 9 multiple cloning sites (nucleotides 400-454) 
were replaced using 70-nucleotide-long oligonucle- 
otides with a sequence containing 4 GATC sites. In 
addition, the sequence replacing the pUCI 9 multi- 
ple cloning sites in pMF200 has a 5-bp insertion as 
compared to pMFlOO creating a nonfunctional 
lacZa in pMF200. The label loop is to indicate this 
difference between pMFlOO and pMF200. 



mid construct is transformed into a darrC bacte- 
rial strain, resulting in a fully methylated plas- 
mid. The two plasmids are then linearized, dena- 
tured, and reannealed, resulting in two 
heteroduplex and two homoduplex plasmids. 
Following digestion with Mbo\ t which digests 
only unmethylated homoduplexes, and Dpnl, 
which digests only fully methylated homodu- 
plexes, the remaining hemimethylated heterodu- 
plexes are circularized, transformed into £. coli, 
and plated onto agar supplemented with X-gal 
and IPTG. In the absence of a mismatch between 
the two test DNA fragments, the 5-nucleotide 
loop in the lacZa gene that results from hetero- 
duplex formation between the white and the 
blue plasmids is not repaired by the mismatch 
repair system. Subsequent plasmid replication 
produces both white and blue plasmids in a sin- 
gle colony, leading to a blue color. In contrast, if 
a mismatch is present in the heteroduplex DNA, 
a corepair event takes place that involves both 
the mismatch in the DNA as well as the 5-nucle- 
otide loop in the lacZa gene. In this case, the 



unmethylated lacZa gene on the blue plasmid is 
degraded and replaced by the lacZa gene from 
the methylated strand of the white plasmid, re- 
sulting in a white colony. Previous in vivo studies 
have suggested that the corepaired segment of 
DNA is at least 1.5 kb (Carraway and Marinus 
1993). We have found that corepair of a mis- 
match and the lacZa gene in the MRD system 
occurs even when the distance between them is 5 
kb (see below). 

RESULTS 

Testing Known Mutations 

As an initial test of the sensitivity and specificity 
of the MRD system, we tested the detection of a 
single-nucleotide mismatch in a 550-bp DNA 
fragment derived from the promoter of the 
mouse p-globin gene (Myers et al. 1985a). We 
used MRD to compare this DNA fragment, which 
contains a T at position -49 relative to the func- 
tional transcription start site of the gene, with a 
second DNA fragment identical in sequence ex- 
cept for a C at position -49. In this experiment, 
the mismatch was located -700 bp from the 
5-nucleotide lacZa loop in the vector. Compari- 
son of the two DNA molecules by using MRD 
resulted in 90% white colonies. In contrast, com- 
parison of the same two DNA molecules with no 
mismatch (-49T/-49T) resulted in only 7% white 
colonies (Table 1; Figs. 2 and 3). Comparison of 
all of the possible different single-nucleotide mis- 
matches at position -49 using MRD revealed pro- 
portions of white colonies ranging from 80% to 
90% (Table 1; Figs. 2 and 3). These results dem- 
onstrate that MRD can detect ail of the different 
DNA variations possible at this position with 
high efficiency. ' 

In an effort to establish the generality of the 
above results, we used the MRD system to detect 
a total of five additional single-nucleotide mis- 
matches in two different DNA fragments (Table 
1). Four of these mismatches are at different nu- 
cleotide positions in the human cystathionine 
P-synthase gene (Kruger and Cox 1995). The re- 
maining one mismatch represents a single- 
nucleotide change in the human agouti gene 
(Wilson et al. 1995). In each case, we were able to 
detect the single-nucleotide mismatch (Table 1). 

The detection of a single mismatch in 10 kb 
of DNA 

In the experiments described above, we were sur- 
prised to observe that we were able to detect the 
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mismatch even when it was as far 
from the loop as 2.3 kb. In addition, 
because the proportion of white col- 
onies was >50%, corepair of the mis- 
match and the loop occurred irre- 
spective of which side the mismatch 
was located relative to the loop on 
the unmethylated strand. In an ef- 
fort to determine whether the effi- 
ciency of mismatch detection would 
remain high if the distance between 
a mismatch and the vector loop was 
even larger, we performed the' fol- 
lowing experiment. A 9-kb test DNA 
fragment derived from bacterio- 
phage X was cloned into the MRD 
plasmid system and compared with ■ 

the same test DNA containing a 2-bp * 
insertion located 5 kb from one end 
of the fragment. Because DNA mole- 
cules used for transformation are cir- 
cular, a mismatch in a 10-kb frag- 
ment is always within 5 kb of the 
loop. The mismatch in this experi- 
ment was at least 5 kb away from the 
loop in either direction. In the pres- 
ence of the 2-bp loop, 70% white col- 
omes were produced, as compared 
with 10% white colonies in the ab- 
sence of the mismatch. These results 
indicate that MRD can detect a mis- 
match in 10 kb of DNA. 

Detecting variation in PCR products 

Next, we investigated the utility of 
MRD for detecting unknown muta- 
tions in genomic DNA fragments 
generated by the polymerase chain 
reaction (PCR). PCR is a practical 
method for obtaining a particular ge- 1 

nomic DNA fragment of interest 

from many different individuals Re- 

cent advances in PCR technology 
make it possible to isolate DNA prod- 
ucts >10 kb in length (Barnes 1994- 
Cheng et al. 1994). However, the introduction of 
errors during the PCR reaction severely limits the 
use of ^dividual cloned PCR product! for mute! 
tion detection, particularly in the case oHong 
PCR products. In an effort to overcome this Z 

h H? deVd0ped 3 pr ° tOC0 ' *s Z 
MRD to ennch for molecules that are free of PCR- 
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Figure 2 (See facing page for legend.) 

induced errors. Following this "cleaning" proto- 
col the cloned PCR products can be compared 
for DNA sequence differences by using the MRD 
procedure described above. 

The basic principle underlying the MRD 
cleaning protocol is the fact that any single PCR. 
induced mutation makes up a very small fraction 
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Table 1. 


Detection of Known Point 


Mutations Using MRD 










Percent 




Fragment 


Distance 


VVI II LC 


Variation 3 


size 6 


from loop b 




Nnnp d 


0.55 


N.A. 


7 


r./r d 




0.7 


89 


A/V 


0.55 


0.7 


84 


G/T d 


0.55 


0.7 


82 


A/C d 


0.55 


0.7 


SO 
0£. 


c/r 1 


0.55 


0.7 


90 


None 6 


2.0 


N.A. 


8 


A/C e 


2.0 


0.4 


35 


None' 


2.2 


N.A. 


10 


C/T* 


2.2 


2.3 


83 


G/A' 


2.2 


2.1 


86 


C/T f 


2.2 


1.6 


81 


T/C f 


2.2 


1.8 


80 



The mutations are in the order listed: C341T, G502A, 
C992T, and T833C. 

J (A/T) At the only position of variation between the two 
fragments compared, the ctom~-grown variant has an A 
and the (tam^-grown variant has a T at the same position 
on the same strand. Therefore, mismatches produced in 
such an experiment are A/A and T/T. 
b ln kilobases. (N.A.) Not applicable. 
c At least 250 colonies were counted to determine the per- 
centage. 

d £xperiment using fragment of the mouse (3-gfobin gene. 
^Experiment using fragment of the human agouti gene. 
'Experiment using fragment of the human cystathionine 
(5-synthase gene. 



G/C 



T/T 



T/G 



Figure 3 Transformation plates of the different 
mismatches at position -49 of mouse (3-globin pro- 
moter. The plate labeled T/T, containing a majority 
of blue colonies, represents the transformation with 
the nonmismatched control, the remaining plates, 
containing a majority of white colonies, represent 
transformation with mismatched molecules. No- 
menclature of the different comparisons is as de- 
scribed in Table 1 . 



of ail the molecules generated by PCR. As a result, 
when the products of a PCR reaction are cloned 
into the blue and the white iMRD vectors and 
assayed as described above, the majority of prod- 
ucts containing PCR-induced errors are present as 
heteroduplex molecules containing a mismatch 
and produce white colonies. In contrast, those 
PCR products with no PCR-induced errors con- 



tain no mismatches and result in blue colonies. 
Given that not all mismatches are repaired with 
100% efficiency, some blue colonies can be ex- 
pected to contain PCR-induced errors following 
the first round of enrichment. However, if blue 
colonies are isolated and used in a second round 
of MRD cleaning, those molecules containing 
PCR-induced errors can be reduced even further. 



Figure 2 The MRD procedure, (a) Formation of the heteroduplex. DNA from the unmethylated blue plasmid 
and the methylated white plasmid containing the fragments to be screened are linearized, denatured, and 
reannealed. The resulting molecules are fully unmethylated blue plasmid homoduplex, fully methylated white 
plasmid homoduplex, and hemimethylated heteroduplexes (two populations of heteroduplexes are formed). 
Only the heteroduplex molecules are left intact after treatment with Mbo\, which digests fully unmethylated 
DNA, and Dpn\, which digests fully methylated DNA. (b) Introduction of the heteroduplex into £ colt and 
detection of the variation. The heteroduplex molecules prepared in a are circularized with T4 DNA ligase and 
transformed into £ coli. In the absence of a mismatch, DNA replication in the bacteria generates both the blue 
and the white plasmid, producing a blue colony. In the presence of a mismatch, repair of the unmethylated blue 
strand of the heteroduplex using the white strand as a template generates the white plasmid only, producing a 
white colony. 
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Because each blue colony contains both a blue 
and a white MRD plasmid (see above), the second 
round of MRD cleaning is carried out as follows. 

Plasmid DNA isolated from blue colonies fol- 
lowing the first round of cleaning is used to trans- 
form both dam" and a dam"' bacterial strains. Al- 
though both blue and white colonies result from 
each transformation, only the blue colonies are 
isolated from the dam~ transformation and only 
the white colonies are isolated from the darn* 
transformation. Plasmid DNA is prepared from 
such colonies, and heteroduplexes are isolated as 
described above. Blue colonies arising from trans- 
formation with these heteroduplexes are en- 
riched further for the products free of PCR- 
induced error. For example, in an experiment in 
which 75% of molecules contain one or more 
PCR-induced errors following PCR, assuming 
95% efficiency of mismatch repair and 10% fre- 
quency of white colonies in the absence of a mis- 
match, the expectation would be 10% blue colo- 
nies following one round of MRD enrichment, 
with 66% of the molecules in such colonies free 
of PCR-induced errors. If the plasmid DNA from 
the blue colonies were used for a second round of 
MRD enrichment, the expectation would be 41% 
blue colonies, with 96% of the molecules in such 
colonies free of PCR-induced errors. 

As a test of the practicality and the efficiency 
of the MRD cleaning protocol, we isolated a 2-kb 
human chromosome 21-specific PCR product 
from each of the two chromosome 21 homoiogs 
of a single individual. The two chromosome 21 
homoiogs were separated from each other in in- 
dependent hamster-human somatic cell hybrid 
clones. Genomic DNA isolated from these so- 
matic cell hybrid clones was the template of the 
PCR reactions. When the PCR products derived 
from each homolog were compared by using 
MRD as described above, -10% blue colonies 
were observed in each case. Following two 
rounds of MRD cleaning, the proportion of blue 
colonies was 60%-80% (Table 2). In contrast, 
when these "cleaned" PCR products derived 
from the two homoiogs were compared with 
each other by using MRD, -90% of the resulting 
colonies were white, indicating the presence of at 
least one single-base difference in the 2-kb PCR 
products derived from the two different chromo- 
some 21 homoiogs. We have demonstrated inde- 
pendently the presence of at least one DNA se- 
quence variation in these 2-kb PCR products by 
finding a Hinpl restriction fragment length poly- 
morphism (RFLP) (data not shown). These results 



Table 2. Detection of DNA Variation 
Following PCR Cleaning 



Percent 

Variants Experimental white 

compared 3 stage colonies' 3 



1/1 


no cleaning 


>90 


2/2 


no cleaning 


>90 


B1/B1 


one cleaning 






round 


70 


B2/B2 


one cleaning 






round 


64 


BlueBI/BlueBI 


two cleaning 






rounds 


38 


BlueB2/BlueB2 


two cleaning 






rounds 


21 


8lueB1/BlueB2 


testing clean 






products 


>90 


BlueB2/BlueB1 


testing clean 






products 


>90 



a l and 2 represent PCR products from two different ho- 
moiogs of chromosome 21 isolated in hamster-human so- 
matic ceil hybrids. (1/1) Comparison of a blue plasmid 
grown in a danf strain containing the PCR product from 
hybrid 1 with a white plasmid grown in a dam* strain 
containing the PCR product from hybrid 1. (Bl/Bl) Com- 
parison of blue dam' grown plasmids obtained from the 
blue colonies of comparison 1/1 to white dom*-grown 
plasmids obtained from the same source. (Blue/Bl/ 
BlueBl) Comparison of blue damr grown plasmids ob- 
tained from the blue colonies of comparison Bl/Bl to 
white dam*-grown plasmids from the same source. 
(BlueBl /BlueB2) Comparison of blue dcm'-grown plas- 
mids from the blue colonies of comparison Bl/Bl to white 
<7'a/n + -grown plasmids obtained from the blue colonies of 
comparison B2/B2. 

b At least 200 colonies were counted to determine the 
percentage. 



demonstrate that MRD can be used to enrich for 
PCR products that are largely free of PCR-induced 
errors and that such products can be used in con- 
junction with MRD to detect human DNA se- 
quence variation. In addition, we have used MRD 
in conjugation with the high-fidelity poly- 
merase, Pfu, to analyze 2-kb PCR products for 
DNA variations without the need to perform PCR 
cleaning (M. Faham and D.R. Cox, unpubl.)- 

DISCUSSION 

Current techniques for detecting unknown mu- 
tations in genomic DNA fall into three general 
classes. The first class of techniques, which in- 
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eludes single-strand conformational polymor- 
phism (SSCP) analysis, denaturing gradient gel 
electrophoresis (DGGE), and heteroduplex anal- 
ysis in gel matrices (Myers et al. 1985b,c; Orita et 
al. 1989a,b; Sheffield et al. 1989; Perry and Car- 
rell 1992; White et al. 1992), detects conforma- 
tional changes created by DNA sequence varia- 
tion as alterations in electrophoretic mobility. 
These techniques are limited by the need to de- 
termine optimum reaction conditions for each 
DNA fragment and by a marked decrease in sen- 
sitivity with increasing DNA fragment size. The 
second class of techniques, which includes RNase 
A cleavage, chemical mismatch cleavage (CMC), 
and enzyme mismatch cleavage (EMC), uses 
chemicals or proteins to detect sites of sequence 
mismatch in heteroduplex DNA (Myers et al. 
1985d; Cotton et al. 1988; Maschal et al. 1995; 
Youil et ai. 1995). These techniques can be used 
to assay many different DNA fragments with a 
single set of assay conditions. In addition, they 
can be used to detect mutations in larger DNA 
fragments. However, even with this second class 
of techniques, the upper limit for the size of the 
screened DNA fragment is -1 kb. 

Unlike all of the techniques described above, 
which involve in vitro analyses of DNA to detect 
sequence variation, MRD utilizes an in vivo assay 
for detecting unknown mutations in genomic 
DNA. We have used this system to analyze a va- 
riety of heteroduplex molecules with inserts 
ranging in size from 550 to 9 kb, representing 
each of the four possible classes of single- 
nucleotide substitutions between the strands. All 
of the mutations tested, nine of nine point mu- 
tations (Table 1) and three of three deletions of 
2-3 bp (M. Faham and D.R. Cox, unpubl.), could 
be detected unambiguously. Our data indicate 
that MRD can detect mismatches in DNA frag- 
ments of up to 10 kb in size. Thus, MRD over- 
comes one of the major limitations of techniques 
currently available for detecting unknown muta- 
tions. 

In some cases of mutation detection (e.g., 
comparison of a patient's DNA with the patient's 
tumor DNA) and polymorphism detection (e.g., 
identification of a polymorphic marker for map- 
ping a recombination breakpoint), the goal of the 
experiment is to identify a variant DNA frag- 
ment. In such cases, MRD's ability to detect DNA 
variation in long DNA fragments with high sen- 
sitivity is particularly useful. In other cases of mu- 
tation detection in human genomic DNA or high 
throughput genotyping, the experimental goal is 



to identify which one of many variations in a 
large genomic region is the disease-causing vari- 
ation. To achieve this goal, one needs to test for 
the presence of the different identified variations 
in many people from the normal population. In 
such cases, an efficient analysis of many small 
fragments is more beneficial than the analysis of 
a long DNA fragment. MRD is well suited for this 
experimental problem, as the technique can be 
used to analyze many fragments simultaneously 
in a single experiment. In such an experiment, 
heteroduplexes are made between a pool of re- 
striction fragments from the genomic region of 
interest of a "standard" and a "test". This is fol- 
lowed by ligation of these heteroduplexes into 
the hemimethylated MRD hetero.duplex vector 
and transformation into E, coii. The resulting 
blue colonies contain DNA fragments that have 
no sequence variation between the tester and the 
standard, whereas the white colonies contain 
fragments with sequence differences. DNA pre- 
pared from the pool of blue colonies contains 
fragments of identity, whereas DNA prepared 
from the pool of white colonies contains frag- 
ments containing differences. Determination of 
whether a specific DNA fragment is present in the 
white pool or the blue pool indicates whether the 
fragment contains a variation. One can use aga- 
rose gel electrophoresis of a restriction digest that 
releases the insert fragments in the blue and the 
white pools to determine the pool in which each 
fragment is present. We performed this proce- 
dure to analyze up to 10 DNA fragments simul- 
taneously for variation (M. Faham and D.R. Cox, 
unpubl.). To analyze more fragments, the resolu- 
tion of the different fragments on agarose gels 
would be impractical. One can use the blue and 
white pool DNA as independent hybridization 
probes on a blot containing DNA from each of 
the different fragments. For each dot, the com- 
parison of the hybridization signal produced by 
the blue probe with that produced by the white 
probe determines whether that fragment con- 
tains a variation. Such a procedure has the poten- 
tial for detecting the presence of mutations in a 
region representing hundreds of kilobases of 
DNA or for genotyping many loci simulta- 
neously. This approach is similar to that used in 
the genomic mismatch scanning (GMS) proce- 
dure for identifying regions of the genome iden- 
tical by descent (Nelson et al. 1993). However, an 
important difference between GMS and MRD is 
that GMS yields a probe only for regions of iden- 
tity, whereas MRD yields probes for both regions 
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of identity and regions of difference. The ability 
to isolate both types of probes results in im- 
proved signal to noise as compared to the use of 
a single probe. Although the manual isolation of 
blue from white colonies becomes impractical as 
the number of colonies becomes large, the use of 
an automatic cell sorter or a selectable system to 
isolate blue from white colonies should allow for 
the analysis of very large numbers of colonies. 

The detection of DNA variation by MRD is 
limited by the ability to obtain the specific DNA 
fragments to be analyzed from the patients of 
interest. However, several approaches are pres- 
ently available to isolate the necessary DNA frag- 
ments, including long-range PCR with high- 
fidelity enzymes (e.g., Pfit) (Neilson et ai. 1995), 
rec,4-assisted cleavage (Ferrin and Camerini-Otero 
1991), and the use of a single set of oligonucle- 
otide primers to amplify multiple specific frag- 
ments simultaneously by PCR (Brookes et al. 
1994). In conjunction with these methods for 
isolating specific genomic DNA fragments, MRD 
provides a powerful technique for the detection 
of unknown mutations, the detection of DNA 
variation in large genomic regions, and high- 
throughput genotyping. 

METHODS 

Construction of MRD Vectors 

pNEB193 (New England Biolabs) was digested with Asel, 
and the larger two fragments were ligated to each other 
leading to a construct, pNEB133, with a deletion of -60 bp. 
pBCKS (Stratagenej was digested with Spei, tilled in with 
Klenow, digested with Sma\, and ^circularized with T4 
DNA Hgase. The resultant clone ^B was resistant to cleav- 
age by Xbal in addition to the other expected enzymes 
[Spel, BamHl, and Sma\), BssHU digest of AB was per- 
formed, and the smaller band (-150 bp) was gel eluted and 
ligated to r/H-digested and Klenow-filled pNEBl33. The 
resultant clone, pNEB133B, had the orientation of T3 be- 
ing the far end from the lacZa gene. The BspHl fragment 
containing the ampicillin gene of pNEB133B was replaced 
with a PCR-generated fragment carrying the chlormpheni- 
col-resistant gene producing pMFO. pNEB133B was di- 
gested with Apa\, the 3' overhang chewed with T4 DNA 
polymerase, a Bglll linker added, and recircularization per- 
formed with T4 DNA ligase producing the clone 
pNEB133BB. The Bgll fragment containing the M13 origin 
of replication from pBluescript was inserted instead of the 
Bgll fragment of pNEB133BB, producing the clone pBBM. 
Two complementary oligonucleotides were cloned into an 
£a>RI-//i>idIII-digested pNEB133. One oligonucleotide had 
the following sequence: 5'-AATTCTGCACGGATCCACGC- 
GATCGCTCTGATCAGCAGATCTCACTG GTGACCTCT- 
TAATTAACAGCATGC-3'. The other oligonucleotide had 
the complementary sequence except it had the sequence 
AGCT as an overhang at its 5' end and it lacked a comple- 
mentary to the last 4 nucleotides of the 5' end of the first 



oligonucleotide. The resultant clone, pOf was digested 
with Bd\ and EcoRl and ligated to 2 complementary olig- 
onucleotides that are identical to the sequence deleted in 
pOI except that they lack the 5 nucleotides GCACG, de- 
stroying the BamHl site and producing a clone, pOII, that 
has a deletion just upstream of the EcoRl site that makes 
lacZa in-frame. The Afllll-Bgtl fragment of pMFO was re- 
placed with the AflUl-Bgil fragments of pOI and pOII pro- 
ducing two chlormphenicoi-resistant clones that are iden- 
tical except for a 5-bp insertion. The two clones were 
named pMF2 and pMF2-5; pMF2 produces white colonies 
in the proper medium and is the product of pOI, and 
pMF2-5 produces blue colonies in the proper medium and 
is the product of pOII. The smaller EcoRl fragment of 
pBBM was replaced with the small EcoKl fragments of 
pMF2 and pMF2-5 producing two ampicillin-resistant 
clones that are exactly identical except for a 5-bp insertion. 

Testing Known Mutations 

A 550-bp Clai-Sad fragment of four variants having A, G, 
T, or C at position -49 of mouse p-globin promoter, was 
cloned into Clal-Sad pMFlOO and pMF200. The T and C 
variants were cloned into pMFlOO; the T, G, and A were 
cloned into pMF200. pMF200 clones were grown in a darrr 
strain (SCS110) (Stratagene), and pMFlOO clones were 
grown in a dam* strain <DH5a) (G1BCO-BRL). The plasmid 
DNA was linearized with .4/7111 in 30-ul reactions. About 
equal amounts (estimated by gel electrophoresis of the lin- 
earized plasmidsj were mixed, and the volume increased to 
LOO al with TE buffer (10 mM Tris, 0.1 itim EDTA). The 
sample was then extracted with 100 ul of a 1:1 phenol/ 
chloroform mixture, followed by extraction with 100 ul of 
chloroform. Five microliters of 0.5 m EDTA and 12.5 ul of 
1 m NaOH were added, and the reaction left at room tem- 
perature for 15 min. The reaction was neutralized by the 
addition of 1 2.5 ul of 2 m Tris (pH 7.2, 125 ul of formamide 
was added, and the reaction incubated at 30°C for 1 hr. 
Chloroform extraction was performed twice, followed by 
ethanol precipitation. DNA was digested in a 20-ul reac- 
tion with Mbo\ (5 units) for 1 hr, and Dpn\ (10-20 units) for 
10 min at 37°C. The reaction was stopped by the addition 
of 1 ul of 0.5 mM EDTA. The intact heteroduplex plasmid 
was separated from the Mbol- or DpnI-digested plasmid by 
agarose gel electrophoresis. The DNA was isolated from the 
gel slice and resuspended in 20 ul of water. Three micro- 
liters was used for a 20-ul recircularization reaction using 
T4 DNA ligase overnight at 16°C. Transformation of DH5a 
was performed with only 30 min recovery at 37°C after the 
heat shock. Transformation reactions were plated on LB 
agar plates with 50 ug/ml of carbinicillin, 64 ug/ml of 
X-gal, and 64 ug/ml of IPTG and incubated at 37°C over- 
night. £coRf-/Vi/II fragments of the cystathionine 0-syn- 
thase alleles were obtained from constructs of Kruger and 
Cox (1995). Adapters converting the EcoRl overhang to 
iVotl overhangs were ligated on the fragments. These frag- 
ments were subsequently cloned in a A/orI-£coRV-digested 
pMFlOO. In addition, the wild-type control fragment was 
cloned in the pMFlOO. The MRD analysis was performed as 
in above. The control experiment was comparing the blue 
vector (pMF200) carrying the wild-type allele to the white 
vector (pMFlOO) carrying the same allele. The test experi- 
ment compared the blue vector carrying the wild-type al- 
lele to the white vector carrying another allele. 
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Kpnl-EcoRV fragments of human agouti alleles were 
obtained from constructs of Wilson et al. (1995) These 
fragments were cloned in /C/^I-EcoRV-digested MRD vec- 
tors The wild-tvpe allele was cloned in both pMF200 and 
oMFlOO. The mutant allele was cloned in pMFlOO only. 
The MRD procedure was performed as described above. 
The control experiment compared the blue vector carrying 
the wild-type allele to the white vector carrying the wild- 
tvpe allele. The test experiment compared the blue vector 
carrying the wild-type allele to the white vector carrying 
the mutant allele. m 

A 9-kb fragment of \ DNA was cloned in pMFl (a 
relative vector to pMF2o) producing the clone pBlO. Par- 
tial digest with BsrBI was performed. (Two BsM sites were 
present: one in the chiormphenicol resistant gene, and the 
other -5 kb in the insert.) Fill-in reaction was performed 
with Klenow, followed by recirculation with T4 DNA 
ligase and transformation into bacteria. Resultant clones 
were analvzed for the generation of a Nml site at the cor- 
rect position; the correct clone was named pBlO + 2. A 
Not\-Kas\ fragment containing the insert of pBlO was 
cloned in a Nuti-Kas I -digested pMF2. The resulting clone 
pB210 was compared using MRD to pBlO and pB10 + *.- 
The MRD procedure was performed as described above. 

Detecting Variation in a PCR Product 

An EcoKi fragment of a cosmid was subcloned in pNEB193. 
Using sequence information of the clone, primers were 
designed to produce a PCR product of -3 kb in size. The 
PCR reaction was performed with the enzyme mixture 
rTth - Vent t Perkin Elmer). The PCR product was extracted 
with an equal volume of a 1:1 mixture ot phenol/ 
chloroform and ethanol precipitated. Restriction digest 
with tfmdlll and Kpni was performed producing a frag- 
ment of -2 kb in size. These fragments were cloned m 
Hmdlll-Kpnl-digested vectors that are relatives to pMFlOO 
and pMF200. Onlv 5% of the transformation mixture was 
plated and the rest was grown directly in 5 ml of Luria 
broth (LB) containing 50 ug/ml of carbinicillin. DNA was 
isolated from the transformation cultures and DNA of the 
pMF200 clones was transformed into SCSI 10 [dam 
strain). Five percent of the transformation was plated and 
the rest grown directly in 5 ml of LB + 50 ug/ ml of car- 
binicillin. DNA isolated from these cultures was compared 
with DNA isolated from the pMFlOO clones carrying frag- 
ments generated from the same source (i.e., the same so- 
matic cell hybrid). The MRD procedure was performed as 
described above. About 50 blue colonies from this com- 
parison were picked and grown in 5 ml of LB + 50 ug/ml of 
carbinicillin. One microliter of a 1:1000 dilution of DNA 
isolated from these cultures was used to transform DHSa 
{dam* strain), and 2 ul of the same dilution was used to 
transform SCSI 10 {dam- strain). White colonies from the 
first transformation and blue colonies from the second 
transformation were picked and grown in 5 ml of LB + :>0 
ug/ml of carbinicillin. DNA isolated from these cultures 
was used to perform the MRD procedure. Subsequent to 
this second round, blue colonies were picked and grown, 
and their DNA was used to transform the two bacterial 
strains as described above. These DNA samples were used 
for comparison fragments generated from the same source 
and fragments generated from a different source (i.e., the 
other somatic cell hybrid). 
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ABSTRACT DNA fragments 536 base pairs long differing by 
single base-pair substitutions were clearly separated in denaturing 
gradient gel electrophoresis. Transversions as well as transitions 
were detected. The correspondence between the gradient gel 
measurements and the sequence-specific statistical mechanical 
theory of melting shows that mutations affecting final gradient 
penetration lie within the first cooperatively melting sequence. 
Fragments carrying substitutions in domains melting at a higher 
temperature reach final gel positions indistinguishable from wild 
type. The gradient data and the sites of substitution bracket the 
boundary between the first domain and its neighboring higher- 
melting domain within eight base pairs or fewer, in agreement with 
the calculated boundary. The correspondence between the gra- 
dient displacement of the mutants and the calculated change in 
helix stability permits substantial inference as to the type of sub- 
* stitution. Excision of the lowest melting domain allows recognition 
of mutants in the next ranking domain. 

Detection and localization of single base substitutions within 
long DNA sequences may be impractical by complete sequence 
determination and improbable on the basis of restriction en- 
lonuclease vulnerability. We present here the results of a pro- 
cedure by which DNA molecules that have minimal sequence 
differences are separated and by which some conclusions can be 
drawn as to the nature of the change. A number of samples can 
conveniently be examined in a single slab gel; each DNA species 
is focused into a sharp band at a gel position determined by its 
sequence and composition. The physical separation of frag- 
ments of altered sequence provided by the denaturing gel makes 
possible further analysis and manipulation. 

Our system makes an unconventional use of electrophoresis. 
Where DNA molecules migrate into a gradient of ascending 
concentration of denaturant, they undergo an abrupt decrease 
in mobility at a characteristic depth, resulting in positions and 
patterns that change little if application of the field is contin- 
ued. The retardation depth in the gradient is determined by the 
least stable part of the molecule and is relatively insensitive to 
other parts of the sequence or to the overall length (1). 

To understand the basis of the sensitivity of the system to 
single base substitutions, we have undertaken a close compar- 
ison of the gel results with those of a sequence-specific statis- 
tical mechanical theory of the stability of the double helix. Ex- 
perimental studies on the helix-disorder transition, melting, 
have not yet provided a detailed test of the theory, which pre- 
dicts intricate and interesting patterns for the progression of 
equilibria from full helicity to separated strands as the temper- 
ature increases for molecules of different sequence. In our sys- 
tem, the molecule is exposed to a gradual denaturation-pro- 
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moting change in the medium, linearly equivalent (2) to a gradual 
increase in temperature. The strong decrease in mobility as the 
helix unravels provides the basis for both sequence-determined 
separation and examination of the helix-disorder transition. 

MATERIALS AND METHODS 
DNA Preparation. A strains were obtained from D. Wulff as 
Sam7 derivatives substituted in the y region. Strain identifi- 
cations and sequence determinations are from Wulff et at (3) 
and D. Wulff and M. Rosenberg (personal communications). 
DNA from those strains was prepared as described. DNA was 
also prepared from A strain cl857 by lytic infection of Esche- 
richia coli strain K-12 W3110 by standard methods. Plasmid 
pKM2 (K. McKenney and M. Rosenberg, personal communi- 
cation), containing the wild-tvpe A sequence at position 38,989- 
40,291 inserted into the Hiridlll site of pBR322, was grown in 
E. coli HB10L Phage and plasmid DNAs were digested to com- 
pletion with Ava l/Bgl II or Alu l/Bgl II under the conditions 
specified bv the supplier (Bethesda Research Laboratories). 
Digestion was stopped by the addition of EDTA to a final con- 
centration of 40 mM, glycerol to a final concentration of 10% 
(vol/vol), and a trace of bromphenol blue tracking dye. 

Gel Electrophoresis. Nondenaturing polyacrylamide (65 
mg/ml- acrylamide/bisacrvlamide, 30:0.8) gels were run at 60°C 
in TAE buffer (40 mM Tris/20 mM NaOAc/1 mM EDTA, pH 
8/HOAc), with 3-5 ixg of whole phage DNA or 0.5-1.0 fig of 
plasmid DNA in each slot. 

Denaturing Gradient Gels. The acrylamide concentration, 
65 mg/ml, and the TAE buffer concentration were uniform 
throughout the gels. All gradients consisted of linearly increas- 
ing concentrations of urea/formamide at a constant ratio. Gels 
were poured from outgassed solutions containing ammonium 
persulfate at 0.1 mg/ml and 0.01% N.^.V'^V'-tetramethyleth- 
vlenediamine using a two-chamber gradient maker or the sy- 
ringe gradient pump previously described (4, 5). The gels were 
submerged in a 5-liter aquarium that contained the anode elec- 
trolyte, TAE. The electrolyte was stirred and controlled at 60.0°C, 
and a field of 6 V/cm was applied. 

RESULTS 

Gradient Discrimination of Mutants. We have compared wild 
type and 16 strains of A that have substitutions in the y region 
using fragments produced by cleavage with Ava l/Bgl II. The 
fragments contain 536 base pairs (bp) preceded by four.unpaired 
bases from the Bgl II site. Numbering begins with the first paired 
base in the Ava I site. A map of the sites and substitutions for 
each of the mutants is shown in Fig. 1. Strain rin-1 cnc-l is a., 
double mutation containing both cin-l and cnc-l substitutions. 

Abbreviations: bp, base pair(s); T m , melting transition. 
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The substitutions Ctrl and Ctrl are present only in strains also 
carrying the substitution cII3059, and strain ciro contains both 
ciro and cII3086 substitutions. , * _ , 

DNA of each strain was digested with Ava l/Bgl II and sep- 
arated bv length in a nondenaturing polyacrylamide gel. The 
corresponding 536-bp fragments migrate to the same depth in 
each lane. The narrow strip carrying the bands was sealed across 
the top of a gradient gel in which denaturant concentration in- 
creased from top to bottom in a uniform polyacrylamide matrix. 
A field of 150 V was applied for 14.5 hr at 60°C. 

As shown in Fig. 2, which presents a set of similar gradient 
gels with the wild-tvpe fragment and 16 mutants, substitution 
of T*\ for CC at position 129. strain q/3019, results in retar- 
dation 0.9 cm higher in the gel than for wild type. The inverse 
substitution at position 136, strain q/3071, delavs retardation 
bevond that of wild tvpe. The shifts are not fully defined by the 
composition of the substitution; ct/3019 is displaced almost twice 
as far as dI3105, although both represent replacements of G-C 
by A-T. The difference between q/2001 and ct/3071 is a simple 
transversion, the interchange of cytosine and guanine across 
strands, but there is a more-than-2-mm difference in gel po- 
sitions. The doubly substituted strain, cin-l cnc-1, identical to 
wild type in gross composition in that it carries both AT— > G-C 
and G-C T*A substitutions, is also shifted. A single T\A -» 
C-G substitution at position 142 in strain cII3086 effects retar- 
dation below wild tvpe while an additional A-T -> G-C substi- 
tution in strain cir5 delays retardation to an even greater depth. 
All of these substitutions occur within the first 144 bp adjacent 
to the Ava I site. The four mutants with substitutions at base 153 
and above have no detectable effect on the gradient position. 
Strain cII3059 and its derivative double mutants are discussed 
below. 



Effect of Single Base Substitution on the Mobility Shift. The 
basis for separation is shown by comparison of the electropho- 
retic mobilities of two fragments in gels in which both fragments 
migrate through a constant concentration of the denaturing sol- 
vent (4). The variation in mobility over a substantial range of 
denaturant concentration is displayed by a denaturing gradient 
perpendicular to the electric field. In this procedure. 20 fig of 
wild-type A Saml DNA and 10 pig of DNA from strain A 
SamTq/2001, containing an A-T-* C*C transition at position 136, 
were cleaved with Ava l/Bgl II and run into an agarose (15 mg/ml) 
slab gel from a single 13-cm-wide starting zone. An agarose strip 
containing the 536-bp band was cut from the ethidium-stained 
gel and sealed with hot agarose across the top of a polyacryl- 
amide (65 mg/ml) gel containing a linear gradient from 1.4 M 
urea/3% (vol/vol) formamide on the left to 3.5 M urea/20% (vol/ 
vol) formamide on the right. A field oH50 V perpendicular to 
the gradient was applied for 7 hr at 60°C (Fig. 3). 

Fragments on the left migrate through a column of minimum 
denaturant concentration. From left to right, fragment mobility 
at first decreases gradually, then suffers a sharp reduction two- 
thirds of the wav to the right at about 2.8 M urea/16% (vol/vol) 
formamide. At this point, the fragment from strain 2001 is dis- 
tinguishable from wild type as a species that has half the staining 
intensity; its sharp mobility transition occurs at a very slightly 
higher denaturant concentration. The wild-type and mutant 
fragments appear to be superposed in both the high- and low- 
concentration regions where the mobility changes only slightly 
with increasing denaturant concentration. Fragment mobility 
decreases to less than one-fourth through the major transition 

ZOne. » rr r ■ 

The Calculated Melting Map. To examine the effect ot a sin- 
gle base substitution on the calculated progression of the helix- 




formamide to 3.5 M urea/20% (vol/vol) formamide, in the same direction as the electric field. The figure is a composite 01 we 
each of four gels, each containing one or more wild-type (wt) samples. 
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Fig. 3. Comparison of mobilities of wild- type and cy2001 frag- 
ments in a denaturing gradient perpendicular to the electric field. A 
mixture of wild- type and cy2001 fragments was electrophoresed in a 
gel in which the denaturant concentration was constant along the path 
of electrophoretic movement but increased linearly in the perpendic- 
ular direction. The sample was applied uniformly along the top from 
a strip of an agarose gel containing nominally the same total amount 
of both fragments at every point. The amount of mutant DNA was half 
that of wild-type DNA. 

random chain equilibrium, we have used the Fixman and Friere 
(6) modification of the algorithm presented by Poland (7) for cal- 
culation of the equilibrium melting transition (T m ) probability 
as a function of sequence and temperature. We have replaced 
the 2- valued stability parameters for base pairs (usually given as 
T mAT or T mC c) by the set of 10 values for nearest-neighbor dou- 
blets suggested by Gotoh and Tagashira (8). The values are given 
in Fig. 4. A base-pair substitution changes the stability values 
in two adjacent positions, and a transversion without substitu- 
tion results in a significantly different net value although the 
overall base composition remains constant. Since the Poland- 
Fixman-Friere calculation depends on at least two statistical pa- 
rameters that are not precisely known (the cooperativity con- 
stant, cr, and a loop-closure exponent/ a), we have compared 
results for several values and combinations of each. We have 
ascribed the variation in T m of nearest-neighbor doublets to 
variation in AH, holding AS constant, but the converse as- 
sumption gives essentially the same results. The patterns of 
melting progression and the differences due to substitution show 
only insignificant differences so long as the parameters do not 
depart drastically from values previously published by others. 
All of the results shown here are based on cr = 3.3 x 10" 5 , the 
center of the range proposed bv Amirikvan et aL (9), and a = 
2.0(10). 

The base sequence of the fragment enters the calculation of 
the melting progression as the sequence of nearest-neighbor 
stability values, represented as temperatures. The expected 
melting progression along the molecule calculated with stan- 
dard parameters is shown in Fig. 4 as a melting map — the tem- 
perature at which each base will be at equilibrium with equal 
probability of helix or random chain configuration. At any tem- 
perature appreciably below the contour for that base pair, the 
pair can be regarded as helical and, at any temperature appre- 
ciably above the contour, the pair can be regarded as melted. 
As expected, a few base pairs at each end melt at lower tem- 
peratures; ends of the helix fray gradually prior to any large co- 
operative changes. The first domain to melt consists of the bases 
between 32 and 142. The uniformity of the ordinate value shows 
that all of the bases of this region melt as a block. A base at the 
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Fig. 4. Expected progression of melting of the wild-type A frag- 
ment The abscissa represents positions along the sequence between 
nearest-neighbor base pairs from the first pair following the A va I scis- 
sion position through the last pair at the Bgl II site. The distribution 
of each base pair between the helical and melted states was determined 
from the Fixman-Friere (6) algorithm at 0.1°C increments, and the 
temperature for 50: 50 equilibrium was inferred by cubic interpolation. 
The function is not shown in the highest melting region between bases 
389 and 405 where the properties depend on the probability of complete 
separation and bimolecular reassociation. a = 3.3 x 10" 5 , S (n) = a 
n~ 2 0 t approximated as a sum of nine exponentials; ±S/R - 12.25 mol. 
The Gotoh-Tagashira values for nearest-neighbor 5' -3' doublets in 19.5 
mM Na* are as follows: T-A, 36.73°C; T-T, 54.50°C; T-G, 86.44°C; A-T, 
57.02°C; A-G, 58.42°C; A-C, 97.73°C; C-G, 72.55 9 C; C-C, 85.97°C; G-C, 
136.12°C (8). 

center of the block progresses from 10% to 90% probability of 
unpairing and unstacking between 66.3°C and 67.3°C. The cal- 
culated contour rises 0.3°C between bases 143 and 144, 2.8°C 
between bases 144 and 145, 1.5°C between bases 145 and 146, 
and 0.4°C between bases 146 and 147, giving the appearance of 
a steep wall. The last melting necessary for strand separation is 
omitted; it is concentration dependent and unnecessary for the 
present analysis. The calculation shows, in general, that melting 
of a DNA molecule can be expected to proceed stepwise as the 
temperature is raised; the effect of cooperativity is strong enough 
that fairly long blocks of contiguous helix melt within temper- 
ature intervals much narrower than the temperature differ- 
ences between entire blocks. The separation into blocks, or do- 
mains, follows from the algorithm, and no assumptions or a priori 
estimates of domain boundaries are required. The loci of do- 
main boundaries are not discernable by inspection of the se- 
quence. 

We note that the lowest melting domain contains the cyL re- 
gion, the sequence thought to determine polymerase recogni- 
tion, and the q/R region falls into a higher melting domain be- 
ginning at about base 147 (3). 

The changes in the melting progression effected by mutations 
between bases 83 and 168 are presented in Fig. 5, together with 
a section of the standard map. The calculations are shown in Fig. 
5 B and C as the differences in temperature for the 50% point 
in the melting equilibrium between the wild-type fragment and 
each mutant at each base pair. The difference maps were ob- 
tained by subtracting the melting map of the complete se- 
quence of the wild-type fragment from the corresponding com- 
plete map of each mutant. As shown in Fig. 5B for four 
representative substitutions and a double substitution, substi- 
tutions that effect substantial displacements in the gels dis- 
tinctly alter the melting temperature in the entire first domain. 
Except for slight shifts in the domain boundaries in this set (a 
shift appears as a spike in the melting map), melting is essen- 
tially identical with that of wild type at all other parts of the se- 
quence. The effect of the double substitution, cm-1 cnc-1, and 
the comparison between cy2001 and cy3071 are of particular 
interest because corresponding differences are seen both in the 
gel positions and in the calculation between domains of identical 
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Fig. 5. Region of the first (lowest melting) domain and difference 
melting maps resulting from basesubstitutions. (A) The low-numbered 
region of the melting map of the wild-type fragment (Fig. 4) is shown 
with an expanded temperature scale. (JB and O T ro values for each base 
in the wild-type sequence have been subtracted from T m values cal- 
culated for the mutant sequences. 

base composition as a consequence of nearest-neighbor inter- 
actions. 

The calculated melting difference maps for strains that reach 
retardation depths in the gel indistinguishable from the parental 
A strain are shown in Fig. 5C. They have a negligible effect (less 
than (X01°C) below baSe 145 in the first domain but depress the 
edge of the adjacent high-melting domain. The effects of these 
mutations tend to be significant from base 147 to about base 225. 
Substitutions at sites 424 in cII3638 and 433 in cII3520, show 
gradient positions indistinguishable from wild type and affect 
the melting map only near the high-numbered end. 

Detection of Mutants Near the Bgl II End. The region within 
which substitutions in the 536-bp A va l( Bgl II fragment are dis- 
cernible in the gel corresponds to the lowest melting domain of 
the theoretical map. Excision of the low-numbered end of the 
sequence can be expected to promote the region of next higher 
T m , extending from base 416 to the Bgl II end, to first melting, 
so that in the shortened fragment substitutions in this region 
should be recognizable. Gel positions of the truncated frag- 
ments of the wild type, cII3638, and dI3520, in which the same 
sequence was cleaved between bases 235 and 236 with Alu I, are 
shown in Fig. 6. Wild- type fragments from the recombinant 
plasmid pKM2 (lanes A and E) focused into a sharp band at the 
same depth as a band of the fragment from whole phage DNA 
(lane B). The mutant fragments, indistinguishable from wild-type 
fragments in the gradient in the original 536-bp molecule (Fig. 
2) focus at greater depths, in good agreement with the calculated 
effect of the substitutions based on melting and mobility theory. 
Note that all of the fragments derived from whole phage DNA, 



Froc. Natl Acad. Set. USA 80 (1983) 



A B C 0 E 




Fig. 6. Gel positions of truncated fragments substituted at the high- 
numbered end. The 301*bp Alu l/Bgl II fragments from pKM2 (lanes 
A and E), Sam! (lane B), cII3638 (lane C), and cII3520 (lane D) were 
analyzed on a 42-60% denaturant/polyacrylamide (150 mg/ml) gel. 
The electric field was applied for 14 hr. 

mutants and wild type, appear as triplet bands in which the sub- 
stitutions uniformly shift all three members. A singlet lower in 
the gel, unaffected by the substitutions, provides a reference 
position. Tripletting of the 301 -bp Alu l/Bgl II fragment was 
also obtained from wild- type phage grown lyrically, while the 
plasmid-derived singlet was unaffected by mixed digestion with 
whole phage DNA. These results are consistent with the sup- 
position that base modification during phage growth may be a 
source of the extra bands. 

DISCUSSION 

Because the gradient interval between any pair of fragments 
differing by a single base substitution in the determinant do- 
main is larger than the width of the bands, samples can be re- 
covered from the gels enriched for either component. The bands 
are narrower than those in simple length separations by con- 
stant-velocity electrophoresis because of the focusing due to the 
reduction in mobility as the determinant domain melts. 

The correspondence between these results and properties 
calculated from the sequence by the Poland-Fixman-Friere al- 
gorithm provides more detailed support for this melting theory 
than has been available from hyperchromia ty profiles. Because 
the six substitutions that alter gradient depth lie below position 
145 and the six that do not alter gradient depth lie above position 
152, it appears that the sequence above position 152 does not 
participate in determining the retardation depth. Following the 
explanation we have offered for identification of the decrease 
in mobility with partial melting (1, 11), we infer that the low- 
numbered region ending between bases 144 and 152 melts at a 
substantially lower temperature and independendy of melting 
above base 152. That difference and the resulting decoupling 
emerges from the theoretical calculation as a distinct domain 
boundary within the limits specified by these mutants. The loss 
of mobility from melting of the first domain prevents the frag- 
ment from reaching the gradient depth necessary for further 
melting within the duration of the run. 

Variation of the loop-entropy and cooperativity parameters 
can result in melting at a slightly lower temperature in the re- 
gion between bases 83 and 131 than in the region near base 61, 
but there is no noticeable change in the maps shown in Fig. 5B 
nor in the boundary position. 

There is a close proportionality between the calculated al- 
teration, AT m , of the plateau T m of the first domain and the change 
in gradient depth reached by the fragments (Fig. 7). For se- 
quences of different domain lengths, a simple comparison ac- 
cording to domain T m is not appropriate and, hence, the se- 
quence carrying the double substitution rir5-cII3086\ which shifts 
tie domain boundary about six bases to the Aval end, is omit- 
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Fig. 7. Gel displacement and alteration in the T m of the lowest 
melting domain. AT m is the temperature difference between each mu- 
tant and wild type at the center of the first domain. A Y is the increment 
in depth in the gradient referred to wild. The point at the origin rep- 
resents wild type and seven strains carrying substitutions above base 
152. The mutants that give triplet zones are as follows: £, cII3059; 
cJI3059 Ctrl; x, cII3059 ctrl\ they have not been included in the cal- 
culation of the regression line. 

ted from the regression line. The level of consistency between 
the calculated T m of the first domain and the observed differ- 
ences in gel depth support the relative assignments of nearest- 
neighbor stability values proposed by Gotoh and Tagashira (8) 
on the basis of less direct evidence. Our use of these values is 
somewhat arbitrary, in that they were proposed as applicable to 
conditions differing from those in the gels — 0.02 M aqueous Na \ 
rather than 0.02 M Tris" and urea/formamide. However, the 
doublet stabilities always enter the calculation as the sum of 
nearest-neighbor values from both sides of each base pair, and 
the present results remain compatible with other doublet-sta- 
bility assignments. Gel measurements with a large set of mu- 
tants may constitute an appropriate means to infer nearest- 
neighbor stability assignments. 

Calculation of the melting map using only 2 stability values, 
one for G*C pairs and another for AT, without consideration of 
neighbors results in a similar melting map. However, the 2-val- 
ued calculation cannot account for the gel displacement of the 
double mutant cin-1 cnc-1, which has a base composition iden- 
tical to that of the wild type, nor for the difference due to trans- 
version found in the comparison between ct/2001 and ct/3071. 

While most bands are accompanied to some extent by weak 
satellites slightly deeper in the gradient, the relative intensities 
of the satellites from cII3059 and its two derivatives are con- 
spicuously greater. This property is retained through repeated 
plaque purification. Since the multiplicity also appears dis- 
tinctly where the cII3059 fragment moves through a constant 
cenaturant concentration, giving a pattern similar to that shown * 



in Fig. 3, the multiplicity is not generated by details of the gra- 
dient. The DNA in each band of the triplet behaves as a stable 
single component after isolation and migration into a new gra- 
dient. The calculated effect of the nominal base substitution in 
cII3059 appears in the difference melting map almost entirely 
as a perturbation beyond the boundary of the first domain. We 
have been unable to arrive at a significantly larger change in the 
first domain despite substantial ad hoc adjustment of many of 
the nearest-neighbor stability values by 5-20°, either singly or 
in combination, or by changes in other parameters. If the most 
displaced (highest) bands of the cII3059 mutants are related to 
the principal wild-type band, this effect could be construed as 
an influence on mobility originating from mutation about 6 bp 
pairs beyond the calculated boundary. If the least shifted (low- 
est) component is taken to indicate the effect of substitution, the 
displacements are compatible with the calculated domain 
boundary, as shown by the cluster of points near the origin in 
Fig. 7. The interval of 2 bp between cII3105 and cII3059 cor- 
responds to the steepest part of the calculated boundary. 

These results suggest that all substitutions from point mu- 
tations can be detected in denaturing gradient gels if they occur 
in the lowest melting domain of the molecule. Sensitivity will 
depend on the net stability change, which depends, in turn, on 
the specific context of the substitution, and on the length of the 
domain over which the effect is averaged. Perturbations at least 
as large as those from substitutions can be expected from single 
base pair insertions and deletions and from some base modi- 
fications. In the present example, the sensitive region initially 
constitutes 20% of the 536-bp fragment; it can be shifted to a 
different 20% by cleavage, which transfers highest melting 
priority to a different domain. Since the separations do not crit- 
ically depend on the length of the molecules, random frag- 
mentation may be as useful as restriction cleavage. By attach- 
ment of a higher melting section to small restriction fragments, 
nearly any sequence of appropriate length can be made to con- 
stitute the lowest melting domain in the resulting composite 
molecule, and nearly all substitutions may be made discernible. 
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ABSTRACT 

Synthetic DNA oligonucleotides can serve as efficient primers for DNA synthesis even when there 
is a single base mismatch between the primers and the corresponding DNA template. However, 
when the primer -template annealing is carried out with a mixture of primers and at low stringency 
the binding of a perfectly matched primer is strongly favored relative to a primer differing by a 
single base. This primer competition is observed over a range of oligonucleotide sizes from twelve 
to sixteen bases and with a variety of base mismatches. When coupled with the polymerase chain 
reaction, for the amplification of specific DNA sequences, competitive oligonucleotide priming provides 
a simple general strategy for the detection of single DNA base differences. 

INTRODUCTION 

Techniques enabling the rapid detection of single DNA base changes are important tools 
for genetic analysis ( 1). When the precise DNA base change in a mutation is known allele 
specific oligonucleotides (ASO's) can identify the unique sequences by differential 
hybridization under stringent conditions (2). Typically, 18-20 base oligonucleotides are 
constructed with perfect complimentarity to either a normaJ (wild-type) or mutant sequence. 
The DNA for analysis is tethered to a solid support and hybridized separately to the 
radioactive probes, the homology between each ASO and the test sequence is then reveaJed 
by sequential washings of the hybrids at high stringency so that the mismatched probe 
is washed free while the perfect match remains bound. 

In order to simplify the detection of single DNA base changes we have used an alternative 
strategy employing mixtures of synthetic DNA oligonucleotides as primers for DNA 
synthesis.. An 'example of the basic principle is outlined in Fig. 1. Two synthetic 
oiigonucleotide primers are mixed in a single annealing reaction with a DNA template. 
Each of the primers is capable of priming DNA synthesis at the same site. However, when 
one primer is perfectly complementary to the DNA template it can bind in preference to 
a primer that differs by a single base. The use of a third oligonucleotide primer (common 
primer) allows the identification of the successfully competing primer by the polymerase 
chain reaction (PCR) (3-6). When the primer-annealing reactions are carried out at low 
stringency and with an excess primer to template ratio, the perfectly matched primer can 
be favored with a level of discrimination greater than 100:1. The perfectly matched primer 
will also compete successfully when it is at low abundance {i.e., in the presence of up 
to a 100- fold excess of a mismatched primer) or when the correct match is only one of 
a four member mixed oligonucleotide family. We also show here that successful competitive 
oligonucleotide priming (COP) is dependent on the length of the primers and that the COP 
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Figure 1. Overview of the general strategy for detection of single DNA base differences by competitive 
oligonucleotide priming (COP). A. DNA template is mixed with two oligonucleotides that differ by a single 
DNA base, if one oligonucleotide is a perfect match to the DNA template it will bind in preference to a rmsmatched 
oligomer. B. The correct match primer may be identified by differentially labeling the two oligonucleoudes and 
including a third 'common' oligonucleotide primer. The common primer and the ^uccesshit COP pnmer are 
incorporated into a DNA fragment generated by PCR. Identification of the incorporated COP primer infers the 
template sequence. 

system can be coupled to the PCR to detect single DNA base changes in mammalian 
genomic DNA. 

MATERIALS AND METHODS 

Oligonucleotide primers (Table 1) were synthesized on an Applied Biosystems 3SUB 
oligonucleotide synthesizer using 0-cyanoethyl phosphoramadite chemistry. Mixed 
oligonucleotides were synthesized using the 380B mixed (competitive) coupling funcuons. 
The relative efficiency of addition of individual bases during mixed synthesis has been 
the subject of previous reports by the manufacturer (7). Following synthesis, oligonucleotides 
were deprotected in ammonium hydroxide for 6-12 hours at 55°C, dried, dissolvedin 
formamide and purified by denaturing polyacrylamide gel electrophoresis. The 
oligonucleotides were electrocuted from gel slices and finally desalted over an NENsorb 
column (Dupont). 

The PCR for the amplification of the competitive priming events was earned out either 
with the large fragment from E. coli DNA polymerase I (Klenow) (United States 
Biochemical Corporation; USB) or with the heat stable DNA polymerase from 77ie/mw5 
aquaticus (Taq) (Perkin Elmer/Cetus). Klenow reactions were in a final volume of 100 
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TABLE 1. OLIGONUCLEOTIDE PRIMERS 







Length 


Template 


i 


CCCAGTOACGACGTT 


15 


M13 'Common' 


85 


AGCTCGGTACCC 


12 


M13 polylinker 


86 


AGCTCGG(TA)AC(CG)C 


12 




98 


CGAGCTCGG(TA)AC(CG)C 


14 




99 


TTCGAGCTCGGfTA)AC(CG)C 


16 




100 


AATTCGAGCTCGG(TA)AC(CG)C 


18 




89 


AATTCGAGCTCGGTACCCGG 


20 




90 


AATTC(GC)AGCTCGG(TA)AC(CG)CGG 


20 




92 


CAAGTGAATGTC 


12 


OTC mutation f-r) 


93 


CAAGTTAATGTC 


12 


OTC mutation (spt) 


94 


CTGTCCACAGAAACAGGC 


18 


OTC 'Common' 


246 


GGCG ATGTC A AT AGG ACTCC AG ATG 


25 


HPRT genomic 


352 


CC ACG A AGTGTTGG AT AT A AGC 


22 


HPRT genomic 


383 


TAATGACACAAACATG 


16- 


HPRT Mutation < +) 


384 


TAATGACATAAACATG 


16 


HPRT Mutation f-) 



^Parentheses denote mixtures of bases at a single position 



land 
rare 
(the 



-t>. 



t 



ftl containing 30 miM Tris-acetate. pH 7.9. 60 mM sodium-acetate. 10 mM magnesium- 
acetate, 10 miM dithiothreitoi. i .5 mM each of dATP. dCTP, dGTP! dTTP. 4 /*M of each 
primer (or primer family) and 0.5 to 1.0 fig of DNA template (3). To initiate the Klenow 
catalyzed PCR reactions, DNA was denatured in 0.4 N NaOH for 5 minutes at 25 °C, 
neutralized with 1/10 volume of 2M ammonium-acetate, pH 4.5. and precipitated with 
2.5 volumes of ethanol. The pellet was resuspended in the PCR mix and annealed at 28°C 
for 3 muu before the addition of 5 units of enzyme. The PCR proceeded with 2 min. 
polymerization at 28°C, 2 min., denaturation at 105 °C (in a heat block containing glycerol) 
and" 30 sec. annealing at 28°C before fresh enzyme was added. The Taq PCR followed 
the procedure of Kogan et al (5) except that the concentration of each primer was 1.0 
fim. Temperature cycling of 37-55°C, 30 sec; 65°C, 3 min: 92°C 1 min. was controlled 
by an automated thermocycler (Perkin Elmer/Cetus). * 

Oligonucleotide primers were labeled at the 5' terminus with T4 polynucleotide kinase 
(USB) to a final specific activity of 25 Ci/mMol. PCR products were analyzed on either 
a 4% NuSieve agarose (Marine Colloids) or 12% poly aery lamide gels and dried for 
autoradiography. Plasmid DNA was prepared by two rounds of cesium chloride/ethidium 
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Figure 2. Strategy for analysis of competition between closely related oligonucleotide primers for an M 13 (mpl8) 
DNA template. The M13 multiple cloning site was PCR amplified with a radiolabeled (*) 'common* primer 
and either a perfectly matched opposing primer or a mixture of opposing primers that included single base mismatches 
to the DNA template. Incorporation of mismatched primers would lead to loss of restriction endonuclease recognition 
sites. 
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bromide density gradient centrifugation and human genomic DNA was prepared from 
transformed human lymphoblasts using an Applied Biosystems 340A DNA extractor. 

RESULTS 

Competition Between Oligonucleotide Primers of Different Length 
Single stranded DNA (ssDNA) from the filamentous phage M13 (mpl8) (8) was employed 
to first demonstrate the efficiency of the COP method. The scheme is illustrated in Fig. 
2. The general strategy was to amplify a region of the mpl8 multiple cloning site (poiylinker) 
with primers that would either faithfully copy the restriction endonuclease recognition 
sequences or destroy the sites because of incorporation of oligonucleotide primers that 
included mismatches to the mpl8 template. The DNA sequence of the template in the 
region bound by the competitive and common primers is shown in Fig. 3a. 

The 'common' primer ( # L Table 1. Fig. 3) was a 15 base oligonucleotide that is often 
used as a universal, DNA sequencing primer. Primers #85, 86, 89. 90. 98, 99 and 100 
each overlap the opposite end of the mpl8 poiylinker where the Sad, Rsal and Mspl 
restriction endonuclease recognition sites are located. Each of the primers were constructed 
so that they would be complementary to the products of extension of primer # 1 through 
the mpl8 poiylinker. Primers #85 (12-mer) and #89 (20-mer) were completely 
homologous to mpl8. and therefore could provide perfect copies of the ssDNA template. 
In contrast, primers # 86 ; 98, 99, 100 and 90 were constructed as mixtures. Each contained 
two or more positions at which two nucleotides were added during synthesis. For example, 
primer #86. contained a mixture of A and T at position eight and C and G at position 
eleven. Thus, primer # 86 had a complexity of four members, one with complete homology 
to the corresponding region of mp!8, two with base mismatches (A:A. G:G) that altered 
single restriction endonuclease recognition sequences (Rsal or Mspl) and one that altered . 
both the enzyme sites. 

Primer # I was radiolabeled at the'5' terminus and employed in separate PCR reactions 
with each of the other primers and mpl8 ssDNA template. The reaction products were 
either directly analyzed by gel electrophoresis and autoradiography, or first digested with 
a restriction endonuclease that identified a site within the amplified region. As expected 
the primer pair # 1/85 generated an 85 bp fragment that was able to be digested to 
completion by PstI yielding a 48 bp radiolabeled fragment and by Rsal or Mspl generating 
77 or 74 bp products, respectively (Fig. 4a). Thus, the perfectly matched primer #85 
was incorporated and faithfully reproduced the "restriction endonuclease recognition sites 
from within its sequence. Somewhat surprisingly an identical result was obtained using 
the primer mixture # 1/86. As only 25% of the primer # 86 family is expected to be perfecdy 
homologous to the mpl8 template the presence of Rsal and Mspl recognition sites within 
the amplified product indicated preferential incorporation of the perfectly matched primer 
relative to the family members with single or double DNA base mismatches. The apparent 
discrimination afforded by the competition for the correct match (/.<?., the relative 
incorporation of the perfect match vs a mismatch) is greater than 100: 1 and even a prolonged 

Figure 3. DNA sequence of the DNA templates used to analyze primer competition. Arrows showing the 
oligonucleotide primers point 5' to 3*. The sequence of the individual oligonucleotides are shown in Table One. 
A. M13 mp!8 poiylinker region. B. Murine ornithine transcarbamylase cDNA (9) C. Human hypoxanthine 
phosphoribosyitransferase fHPRT) exon sequences. The primers for the amplification of the human genomic DNA 
were each complementary to exon sequences. 
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exposure of the gel illustrated in Fig. 4a failed to show any evince of the pnmer pair 
# 1786 product being refractory to Rsal or Mspl wjth the overaI1 

The successful competition of a correct ^^J^^ erri p,oyed in the 
length of the oligonucleotide. When the pertect match 20-mer ff sy * as em f ,oy 7 . ?4 
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Figure 4. Competition between closely related primers for the M I3mpl8 single stranded DNA template. A complete 
explanation is given in the text. The results of restriction endonuclease digestion of amplified M13 DNA using 
A. 12 base, B. 20 base and C. 14. 16 and 18 base oligonucleotides as COP primers are shown. The incorporation 
of mismatched oligonucleotides leading to loss of a restriction endonuclease recognition sites is observed when 
the competing oligonucleotides are 20-mers (B.. last three lanes). 

3b). Primers #92 and 93 are 12-mers that match the wild-type and spf sequences, 
respectively and #94 is a common 18-base primer complementary to the OTC cDNA 
in an opposite orientation to primers #92 and 93. Ten cycles of PCR were performed 
with cloned wild-type OTC cDNA or the spf OTC cDNA as template using primers #94 
and an equimolar mixture of #92 and 93. A trace of radiolabeled primer #92 or 93 was 
used to monitor the competition between the two oligonucleotides. 

Fig. 5a shows an ethidium bromide stained 4% NuSieve agarose gel used to analyze 
PCR products generated from primers #94/92/93. A fragment of the predicted size is 
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- fJ -r .• r a .rm^rsion in cloned murine ornithine transcarbamylase (OTO cDNA. 

I* 92/93) and a common primer t * 94> (Table One, were analyzed by agarose gel electrophores.s,. 

E £S agarose gel olec.rophoresis o, COP l^-^Jj ^ i 

normal match primer labeled. 2. normal .empla.c. mu.am I^^^^^^JSwiph of *e 

d^r^^^ 

of perfectly matched DNA templates. 

generated from both the wild-type and the mutant templates. When this gel was dried and 
exposed to X-ray film only the fragments that were generated in reactions containing a 
rad^ed primer that wi a perfect match to the template had }^^^ty 
(Fig. 5b). A prolonged exposure revealed faint bands from mismatch incorporation but 
indicated that the level of discrimination was greater than 100 to 1. 
Preference for the 'correct' oligonucleotide primer ^nn-mtPri 
The strong preference of a DNA template for a 'correct' primer was further demonstra ed 
by competing a perfecdy-matched oligonucleotide primer with an excess of a mismach 
oUgonudeotide (Fig. 6) The preference of the cloned wild-type OTC template for the 
perfect match prinfer #92 above #93 when the two oligonucteoudes were present in 
equimolar ratios (Lanes 1 and 2) was reduced only slightly whence radiolabeled mismatch 
3 93) was present in a 100-fold molar excess (Lane 3) At a 1000: 1 ^ ™£ 

concentration of the correct primer was approximately that of. the DNA template j[Lane 
4) the mismatch was incorporated with a still lower efficiency than when the mismatch 
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Figure 6. Successful primer competition in the presence of an excess of mismatched primer. The experiment 
is similar to that illustrated in Fis. 5 and a complete description is given in the text. Lanes I and 2 show incorporation 
and exclusion of radiolabeled perfect match and mismatched primers, respectively. Lane 3 indicates the level 
of incorporation of a radiolabeled mismatched primer present in a 100-fold excess above a periectly matched, 
unlabeled primer (4.0 pM vs 40 nM); Lane 4. mismatch present at 1000-fold excess (4.0 jiM vs 4.0 nM).: Lane 
5. mismatch alone (4.0 juM). Lane o shows the exclusion of a radiolabeled mismatched primer (4.0 ^M) that 
was annealed to the DNA template for 3 min before addition of an cquimolar amount of the correct match primer 
and initiation of the reaction. 

primer was present alone (Lane 5). In a further reconstruction experiment the template 
was annealed to the radiolabeled mismatched oligonucleotide for the usual 3 min. before 
an equimolar amount of the correct primer was added. DNA polymerase was added after 
1 minute more annealing and the PCR was carried out as before. Surprisingly, the correct 
match primer was predominately incorporated (Lane 6) reflecting the ability of the perfectly 
matched primer to displace any mismatched primer that might have been bound. 
Detection of Single DNA Base Differences in Genomic DNA Using Taq DNA Polymerase 
To test whether the COP mutation detection system could be adapted to conditions that 
enable the use of the heat stable Taq DNA polymerase (5.6). oligonucleotides 
complementary to normal or mutant human hypoxanthine phosphoribosy [transferase (HPRT) 
sequences were constructed (Table I. ref 9). The oligonucleotides (16-mers) differed by 
a sinsle base (C vs T) at the eighth position and had previously been employed as ASO 
hybridization probes to identify'the corresponding normal and mutant alleles in a family 
study of HPRT deficiency (10). To enable COP analysis of the G to A transition an 
approximately 1950 base region of the HPRT gene containing the known mutation site 
was first PCR-amplified from genomic human DNA samples using primers #246/352. 
This fragment appeared homogeneous when analyzed by agarose gel electrophoresis. Five 
percent of the initial reaction products were then taken to initiate a further 10 rounds of 
PCR, with each of the allele-specific COP oligonucleotides (#383/384) and a common 
( # 352) primer present. Four COP reactions were performed with either the correct match 
or mismatched primers radiolabeled at the 5' terminus and with either the normal or mutant 
'preamplified' alleles as DNA templates. Analysis of each of the COP reactions by agarose 
gel electrophoresis and ethidium bromide staining revealed predominant products of the 
expected size. When the gel was dried and exposed to X-ray film it was found that the 
only fragments that were radiolabeled were the expected sized products from reactions 
where the radiolabeled primers perfectly matched the DNA. templates. Thus the normal 
and mutant alleles were each correctly identified (Fig. 7). ■ 

DISCUSSION 

We have demonstrated competition between closely related synthetic oligonucleotide primers 
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Figure 7. Identification of a single DMA base change in human DNA by COP. using Taq DNA polymerase. 
Four COP reactions containing competing primers complementary to normal and mutant HPRT gene sequences 
( #383/384) and a common primer ( #352) (Table One) were performed using 'preamplified* human genomic 
DNAs as templates. Left: Ethidium bromide -stained agarose gel electrophoresis of PCR and COP products. Lanes 
I to 6 show the products of PCRs used to -preumpiifr" the regions surrounding the normal and mutant alleles, 
respectively. Lanes 2-5 show COP products. 2. normal template, normal match primer labeled. 3. normal template, 
mutant match primer labeled. 4. mutant template, normal match primer labeled. 5, mutant template, mutant match 
primer labeled. Right: Autoradiograph of the dried agarose gel. showing the preferential incorporation of perfectly 
matched radiolabeled oligonucleotide primers. 



for a single DNA template. When coupled to PCR, COP provides a simple method for 
the identification of single DNA base differences and thus represents an alternative to ASO 
probing for the anaiysis~of mutations for which the precise DNA sequences can be predicted. 
In contrast to ASO probing the COP procedure does not require the use of solid filter 
supports and is technically more simple to perform. The COP strategy may therefore be 
favored for the routine analysis of known single DNA base substitutions or polymorphisms. 

Successful COP has been shown to occur when competing oligonucleotides differ by 
single T-G, T-A, C-T or C-G base changes that generate A-A, G-G, G-A, T-C. C-A and 
T-G mismatches between the oligomers and their corresponding DNA templates. This 
represents six of the twelve possible base mismatches that may perturb normal 
Watson -Crick DNA base pairing. If reciprocal mismatches are equivalent (e.g., A-T vs 
T-A) then only T-T and C-C mispairings are not described in this study. In other experiments 
C-C and G-G mismatches have been identified by COP (J. S. Chamberlain, personal 
communication) but it is likely that the general efficiency of the primer competition will 
be determined by both the mismatched base that is involved and its surrounding sequence 
context. Therefore many mutations in different sequences may need to be examined before 
a base mispairing that cannot be identified by COP could be found. The M13mpl8 DNA 
template amplification/restriction strategy described here, coupled with mixed 
oligonucleotide synthesis represents a convenient method for further study of these 
relationships. 

The competing oligonucleotide primers described above are short (12-16 nucleotides). 
Although this length exceeds that necessary for efficient priming of DNA synthesis it is 
less than the usual length employed for ASO probing. The efficiency of the competition 
can be reduced when the oligonucleotide primers are 20-mers but COP is still effective 
with 16-mers. As a general rule we are continuing to construct 16 base oligomers for 
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mutation identification as they can function both as ASO probes and competing primers, 
to further test the method. 

An important feature of this study was the adaptation of Taq DNA polymerase to the 
COP reactions. If COP is to be a favored method of mutation detection then the procedure 
should be more simple to perform than alternatives that offer the same specificity of DNA 
sequence discrimination. The substitution of Taq in place of Klenow and the accompanying 
development of automated thermocyclers has enabled the widespread acceptance of PCR 
technology. The demonstration that a single DNA base difference can be identified by 
COP using Taq allows a similar protocol for point mutation detection to that used for routine 
PCR DNA amplification. The region containing the mutant sequence is first amplified 
and a small aliquot of the PCR product taken to initiate the COP reaction. The radiolabeled 
COP products are then analyzed by gel electrophoresis and autoradiography to identify 
the individual alleles. 

A possibly important determinant of the relative hybridization efficiency of competing 
oligimers may be the position of the individual base mismatches within the oligonucleotides. 
In addition, the occurrence of a mismatch at the 3' terminus of an oligonucleotide may 
inhibit primer extension, provided the DNA polymerase used lacks a 3' to 5' exonuclease 
proofreading activity. Although this schema offers a strategy for mutation detection that 
is similar to COP in the manipulations and reagents that are required the underlying 
mechanism would be fundamentally different. In that case an oligonucleotide primer would 
not be required to bind specifically to the correctly matched allele, which contrasts to the 
central feature of ihe COP mechanism. 

A mutation detection technique requiring a 3' base mismatched oligonucleotide and DNA 
ligase has been recently described (1 1). The method allows mismatch detection by failure 
of head-to-tail ligation of two synthetic oligonucleotides at the site of a mutant DNA base. 
Excellent discimination between each of the 12 possible base mismatches and the 
corresponding perfect matches has been reported. The ligation method has a similar 
advantage to COP in that solid filter supports are not necessarily required. Unlike the primer 
competition reactions the ligation conditions are not so easily compatible to PCR buffers 
and therefore more extensive sample manipulation may be required for the analysis of 
PCR amplified DNA sequences. 

There are many technical refinements that could be adapted to improve the COP method. 
More than two oligonucleotides can simultaneously compete for the same DNA priming 
site and multiple fluorescemly labeled oligonucleotides (12) could be used to simultaneously 
test regions with a high degree of genetic heterogeneity. The maximum number of 
oligonucleotide species that could be employed in a single reaction has not yet been 
established but the observation that competition can occur when a mismatched primer is 
present at 100-fold higher abundance than the correct match suggests.that even very highly 
polymorphic loci may be amenable to single analyses. Addition of a biotin residue to the 
5*' terminus of a common primer (13) could further facilitate the method by allowing rescue 
of a PCR-amplified fragment that has incorporated the successfully competing, differentially 
labelled primer by an~avidin bound support. In this case the final analysis of the COP 
products would not require gel electrophoresis and could be monitored by measurement 
of the radioactive or fluorescent incorporation of the support matrix. Such refinements 
may eventually lead to the complete automation of DNA base difference detection for genetic 
disease diagnosis and the analysis of other DNA sequence polymorphisms in complex 
genomes. 
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ABSTRACT Crigler-Najjar syndrome type I is character- 
ized by unconjugated hyperbilirubinemia resulting from an 
autosomal recessive inherited deficiency of hepatic UDP- 
glucuronosyltransferase (UGT) 1A1 activity. The enzyme is 
essential for glucuronidation and biliary excretion of biliru- 
bin, and its absence can be fatal. The Gunn rat is an excellent 
animal model of this disease, exhibiting a single guanosine (G) 
base deletion within the UGT1A1 gene. The defect results in a 
frameshift and a premature stop codon, absence of enzyme 
activity, and hyperbilirubinemia. Here, we show permanent 
correction of the UGT1A1 genetic defect in Gunn rat liver with 
site-specific replacement of the absent G residue at nucleotide 
1206 by using an RNA/DNA oligonucleotide designed to 
promote endogenous repair of genomic DNA. The chimeric 
oligonucleotide was either complexed with polyethylenimine 
or encapsulated in anionic liposomes, administered i.v., and 
targeted to the hepatocyte via the asialoglycoprotein receptor. 
G insertion was determined by PCR amplification, colony lift 
hybridizations, restriction endonuclease digestion, and DNA 
sequencing, and confirmed by genomic Southern blot analysis. 
DNA repair was specific, efficient, stable throughout the 
6-month observation period, and associated with reduction of 
serum bilirubin levels. Our results indicate that correction of 
the UGT1A1 genetic lesion in the Gunn rat restores enzyme 
expression and bilirubin conjugating activity, with conse- 
quent improvement in the metabolic abnormality. 



UDP-glucuronosyltransferases (UGTs) are a family of mem- 
brane-bound enzymes that catalyze the conjugation of numer- 
ous xenobiotics and endogenous substrates with glucuronic 
acid. Of the known isoforms, only UGT1 Al is physiologically 
relevant in bilirubin glucuronidation and biliary excretion of 
this potentially toxic metabolite (1, 2). Crigler-Najjar (CN) 
syndrome is the inherited deficiency of hepatic UGT1A1 
activity and is characterized by elevated serum levels of 
unconjugated bilirubin (3). Of the two types of CN syndrome, 
type I is more severe and is characterized by a nearly complete 
absence of UGT1A1 activity, whereas incomplete deficiency of 
the enzyme is associated with the less severe type II form (4, 5). 

The homozygous Gunn rat, a mutant strain of Wistar rat, is 
an accurate animal model for CN syndrome type I. Its liver 
lacks UGT1A1 activity because of the deletion of a single 
guanosine (G) base in UGT1A1 that results in a frameshift and 
a premature stop codon (6, 7). Recombinant adenoviral 
vectors have been used in vivo to correct the hyperbiliru- 
binemia in the Gunn rat with persistent expression of the 
human bilirubin UGT1A1 enzyme for as long as 2 months (8, 
9). Significant progress also has been made in overcoming the 
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immunogenicity of the adenoviral-based vectors (9-11), but 
their use requires repeated treatments and immunomodula- 
tion of the host to maintain therapeutic levels of UGT1A1. 

A novel approach, based on mechanisms of DNA repair 
(12), was reported to correct single nucleotide mutations in 
episomal and genomic DNA (13, 14). It was observed that an 
oligonucleotide (ON) composed of both DNA and RNA 
exhibited increased pairing efficiency with a genomic DNA 
target (15, 16). The chimeric RNA/DNA ON was designed for 
increased stability, resistance to nucleases, and improved 
localization to genomic target sites (17). In a typical duplex 
structure, the double-stranded region of the molecule is 
capped by single-stranded thymidine hairpins. The 5' and 3' 
ends of the molecule are juxtaposed and sequestered by using 
a 5-bp GC clamp at the 3' end. The RNA residues are 
2'-0-methylated to prevent RNase H degradation as well as to 
improve the formation of joint molecules (18). The homology 
segment between the RNA/DNA ON and its genomic target 
is designed with a single mismatch, which promotes the 
site-directed genomic alteration by endogenous repair path- 
ways (17, 19). 

We have used this technology previously to introduce site- 
specific missense mutations in genomic DNA in cultured 
human hepatoma cells (20) and in nonreplicating isolated rat 
hepatocytes (20, 21). In addition, >40% of the rat hepatic 
factor IX alleles were mutated in vivo by using a nonviral 
delivery system targeted to the hepatocyte via the asialogly- 
coprotein receptor (21, 22). Both the genomic and phenotypic 
changes were stable for more than 1 year in quiescent as well 
as regenerated livers. 

Here, we demonstrate that chimeric RNA/DNA ONs can be 
used for site-directed insertion of a single G nt in genomic 
DNA from cultured hepatocytes and intact liver of the Gunn 
rat. The repair process is dose dependent and associated with 
restoration of the wild-type BstNl restriction endonuclease site 
in the UGT1A1 gene. In addition, the phenotypic change is 
characterized by the hepatic appearance of UGT1 Al protein, 
secretion of conjugated bilirubin in bile, and decreased serum 
bilirubin levels. This strategy of genomic alteration circum- 
vents many of the disadvantages associated with viral vector- 
mediated gene transfer. Our results suggest that site-directed 
gene repair offers an attractive alternative to gene augmen- 
tation using recombinant viruses or hepatocyte transplanta- 
tion (23) in the treatment of CN syndrome type I. 

MATERIALS AND METHODS 

Synthesis of the Chimeric ONs. The chimeric RNA/DNA 
ONs were obtained from Kimeragen (Newtown, PA). They 
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were synthesized by using DNA and 2'-0-mcthyl RNA phos- 
phoramidite nucleoside monomers as described (13). After 
deprotection and purification by HPLC, more than 98% of the 
purified ONs were full length. Fluorescently labeled ONs were 
synthesized by using a fluorescein-modified deoxynucleotide 
at the initial 5' position of the all-DNA strand. 

Polyethylenimine (PEI) and Liposomal Formulations. The 
25-kDa PEI (Fluka) was lactosylated by using sodium cya- 
noborohydride (Sigma) as described (22). For in vitro trans- 
fections, the chimeric ONs were combined with PEI at nine 
equivalents of PEI nitrogen per ON phosphate in 0.15 M NaCl. 
For in vivo delivery, the chimeric ONs were complexed with 
PEI at an ON phosphate/PEI amine ratio of 1:6 in 5% dextrose 
(22, 24). 

Lipid films of dioleoyl phosphatidylcholine/dioleoyl phos- 
phatidylserine/galactocerebroside (Avanti Polar Lipids) were 
prepared at a 1:1:0.16 molar ratio, hydrated, and extruded 
down to 0.5 /im as described (22). For in vitro transfections, 150 
jig of UGT1A1/0.5 ml of 0.15 M NaCl was used to hydrate a 
0.5-mg lipid film. For in vivo delivery, 600 fxg of ON/ml of 5% 
dextrose was used to hydrate a 2-mg lipid film. Fluorescently 
labeled ONs were encapsulated in the anionic liposomes by 
using the same methods. The encapsulation efficiency of the 
RNA/DNA ONs was >80%. 

Cell Culture and Transfections. Gunn rat hepatocytes im- 
mortalized by using the simian virus 40 temperature-sensitive 
large T-antigen were maintained at the permissive (33°C) 
temperature in supplemented DMEM (25). Ceils were de- 
tached by using trypsin-EDTA and replated at a density of 2 x 
10 5 cells per 35-mm Primaria (Becton-Dickinson) dish at the 
nonpermissive temperature (37°C) 24 h before transfection. 
Cells were transfected in 1 ml of the same medium supple- 
mented with 2.5 mM CaCl 2 by using a 100-fil aliquot of 
transfecting solution containing the chimeric ONs complexed 
to PEI or vehicle alone. After 18 h, 2 ml of medium was added, 
and the hepatocytes were maintained for an additional 30 h at 
37°C before harvesting by scrapping. For repeat transfections, 
the medium was removed after 48 h and replaced, and the cells 
were transferred to 33°C for expansion. One week later, the 
cells were prepared, transfected, and harvested as outlined 
above. 

Gunn rat hepatocytes were isolated by collagenase perfu- 
sion as described (25) and plated at a density of 1 X 10 6 
cells/T25 flask in a chemically defined medium (hepatocyte 
growth medium, HGM) (26). Cells were transfected with 300 
ji,l of the liposome-encapsulated chimeric ONs, or vehicle 
alone, in 3 ml of HGM supplemented with 10% heat- 
inactivated FBS and 2.5 mM CaCl 2 . Three milliliters of FBS- 
supplemented HGM was added 18 h after transfection, and the 
cultures were maintained an additional 30 h at 37°C. Parallel 
transfections of Gunn rat hepatocytes were done with the 
f Iuorescein-labeled chimeric ONs, and the cells were analyzed 
by confocal microscopy as described (20, 22). 

In Vivo Delivery Systems. Male rats («*65 g; Harlan Sprague- 
Dawley) received 200 /xg of f Iuorescein-labeled chimeric ONs 
that were either naked, encapsulated in anionic liposomes, or 
complexed to PEI in 5% dextrose by tail vein injection. For 
asialoglycoprotein receptor competition, animals received bolus 
injections of 5 mg/100 g body weight of asialofetuin (ASF) in 0.15 
M NaCl 1 min before and 3 min after injection of the fluorescently 
labeled ONs (27). Tissue samples were frozen in OCT, and the 
cryosections were fixed for 10 min with 4% paraformaldehyde 
(wt/vol) in PBS, pH 7.4. Tissue distribution of the fluorescently 
labeled ONs was determined by confocal microscopy as described 
(21). 

Gunn rats (^80 g; Harlan Sprague-Dawley) were adminis- 
tered aliquots of 200 pug of UGT1A1 either complexed to PEI 
or encapsulated in anionic liposomes, or an equal amount of 
vehicle alone by tail vein injection in 5% dextrose on 5 
consecutive days. Seven days and 4 months postinjection, 



random liver tissue samples were removed for DNA isolation. 
At 6 months, bile samples were collected from the animals as 
described (10) as well as blood and liver tissue for enzyme 
activity, DNA, and Western blot analysis. 

A separate group of Gunn rats («*200 g) was injected on 5 
consecutive days with either vehicle, or the chimeric ONs 
complexed to PEI or encapsulated in liposomes. A total dose 
of 3 mg/rat (600 tig/day X 5) was administered by tail vein 
injection in 5% dextrose. Rats treated a second time received 
the same dosing schedule. Blood was drawn under ether 
anesthesia for serum bilirubin levels and alanine aminotrans- 
ferase activity (Sigma). Bile samples were collected by bile duct 
cannulation as described (10). 

PCR Amplification, Cloning, and Analysis. Genomic DNA 
larger than 100 bp was isolated by-using the high pure PCR 
template preparation kit (Boehringer Mannheim). DNA from 
liver tissue samples was isolated as described (21). PCR 
amplification (30 cycles of 94°C for 45 s, 55°C for 20 s, and 72°C 
for 45 s) of a 379-nt region of the rat UGT1A1 gene using the 
primers 5'-GGGATTCTCAGAATCTAGACATT-3' (sense) 
and 5'-GTGTGTGGTATAAATGCTGTAGG-3' (antisense) 
(28) was performed with 300 ng of the isolated DNA. To rule 
out PCR artifacts, 1 /xg of UGT1A1 alone, or 300 ng of Gunn 
rat DNA incubated with up to 1.5 ptg of the UGT1A1 chimeric 
ON, was subjected to PCR amplification. The amplification 
products were subcloned into the TA cloning vector pCR 2.1 
(Invitrogen), and the ligated material was used to transform 
frozen competent Escherichia coli. 

After plating, the colonies were lifted onto Micron Separa- 
tions MagnaGraph nylon filters, replicated, and processed for 
hybridization with 32 P-end-labeled 17-mer ON probes 1206A 
(5 ' - ATGTCCTG A A ATG ACTG-3 ' ) or 1206G (5'-ATGTC- 
CTGGAAATGACT-3'). Hybridizations were performed at 
37°C for 24 h and the filters were washed as described (20). 
Plasmid DNA prepared from colonies hybridizing with 1206A 
or 1206G was sequenced on an ABI 370A sequencer (Perkin- 
Elmer) by using the mpl3 forward and reverse primers as well 
as a gene-specific primer 5 ' -CCCATGGTATTTATGAAG- 
GAATATGC-3' corresponding to nucleotides 1071-1106 of 
the rat UGT1A1 cDNA (7). The PCR amplicons from DNA 
samples isolated after 6 months were subjected to jBs/NI 
restriction endonuclease digestion and separated by using 1% 
agarose gel electrophoresis to distinguish the wild type from 
the mutant UGT1A1 Gunn rat gene sequence (29). 

Southern and Western Blot Analyses. Genomic DNA from 
the 6-month liver samples was digested sequentially with 
EcoRl then ZfrfNI, and the fragments were resolved by elec- 
trophoresis through a 1% agarose gel. After capillary transfer 
to nitrocellulose membrane, the blots were hybridized for 24 h 
at 65°C in 6x SSC containing 1% SDS, 5x Denhardt's, and 200 
jig/ml denatured sonicated fish sperm DNA with 32 P-labeled 
probe corresponding to the 379-nt PCR-amplified fragment of 
the rat UGT1A1 gene. After hybridization, the filters were 
washed in lx standard saline phosphate/EDTA (0.154 M 
NaCl/10 mM phosphate, pH 7.4/1 mM EDTA; SSPE), 0.5% 
SDS and then 0.1 X SSPE, 0.5% SDS at room temperature and 
37°C, respectively, and analyzed by phosphoimaging. 

Total homogenate and microsomes were isolated from the 
flash-frozen liver tissue samples by using the buffers and 
procedures outlined (30). Protein concentrations were deter- 
mined with the Bio-Rad protein assay kit. Aliquots of 100 pLg 
of total or microsomal proteins were separated by using 7.5% 
SDS/PAGE. After electrophoretic transfer onto nitrocellu- 
lose membranes, immunoblots were incubated sequentially 
with H 2 0 2 , 5% milk blocking solution, primary antibody 
(1:5,000) to rat UGT1A1 (28), and horseradish peroxidase- 
conjugated goat anti-rabbit IgG secondary antibody. UGT1 Al 
protein was detected by using the Ultra chemiluminescent 
system (Pierce). 
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Serum Bilirubin, UGT1A1 Activity, and HPLC Analysis. 
Serum bilirubin concentrations were determined in blood 
drawn from the Gunn rats by using a Sigma diagnostic kit. The 
UGT1A1 enzyme activity was assayed in digiton in-activated 
liver homogenates with bilirubin as the acceptor aglycone, as 
described (1, 2). Bile pigments collected from the cannulated 
bile ducts of each animal group were evaluated for bilirubin 
glucuronidation by HPLC analysis as described (11). Authentic 
pigments were used as standards, and pigments were identified 
by retention times. 

RESULTS 

UGT1A1 Correction in Cultured Hepatocytes. We designed 
the chimeric ONs with the hybrid RNA/DNA strand targeting 
the nontranscribed DNA sequence of the UGT1A1 gene (Fig. 
1). The sequence of the RNA/DNA molecule was identical to 
that of the mutant gene with one change. An additional G was 
placed as the center nt within the stretch of nine DNA residues, 
flanked on both sides by blocks of modified RNA. The 
genomic target site corresponded to nucleotide 1206 of the 
complementary strand of the mutant cDNA (7). 

Both the liposomal and PEI delivery systems were targeted 
to the hepatocyte asialoglycoprotein receptor (22, 31). Gunn 
rat hepatocytes initially were transfected with the f luorescently 
labeled ONs at 150, 180, and 300 nM concentrations. There 
was significant cell uptake and nuclear localization of the 
labeled chimeric molecules in both the immortalized and 
primary Gunn rat hepatocytes (data not shown). These cells 
then were transfected with unlabeled UGT1A1 ONs, which 
were either complexed to PEI or encapsulated in the anionic 
liposomes. The frequency of G nt insertion at position 1206 
was determined by hybridization of duplicate colony lifts of the 
PCR-amplified and cloned 379-nt stretch of exon 4 of the rat 
UGT1A1 gene (28). 

The filter lifts were hybridized with the 32 P-end-labeled ON 
probes 1206A and 1206G (Fig. 2A). The overall frequency of 
conversion of the targeted nt was calculated by dividing the 
number of clones hybridizing with the 1206G probe by the total 
number of clones hybridizing with both probes. G insertion was 
observed only in hepatocytes transfected with UGT1A1, and 
not in cells transfected with vehicle or nonspecific chimeric 
ONs. Additionally, no hybridization of the 1206G probe 
occurred in clones derived from DNA isolated from untreated 
hepatocytes and PCR-amplified in the presence of 0.5-1.5 u,g 
of the UGT1 Al ON. Nucleotide insertion was dose dependent 
and was as high as 15.3%. In addition, the frequency of G 

Lys*** 

S, A6CTGGGGT6ACCCTGAAT6TCCTGAAATGACTGCCGATGATTTG 3 ' 

^.TGCGCG-gggocuuacoGGACCTTTAcugacggcuaT^ 

T TCGCGC CCCTGAATGTCCTGGAAATGACTGCCGAT7 T 
3, TCGACCCCAQGGGACTTACAGGAGTTACTGACGGCTACrAAAC 5, 

I 

GluMet 

AGCTGGGGTGACCCTGAATGTCCTGGAAATGACTGCCGATGATTTG 
TCGACCCCACTGGGACTTACAGGACCTTTACTGACGGCTACTAAAC 

Fig. 1. Targeting strategy to correct the UGT1A1 frameshift 
mutation in the Gunn rat. The 2'-0-methyIated RNA residues of the 
targeting RNA/DNA ON (blue) are indicated in lowercase and the 
DNA residues in capital letters. Blocks of 10 modified RNA residues 
flank both sides of a 9- residue stretch of DNA, which contains the base 
change required for correction. The ON sequence is complementary 
to 28 residues of genomic DNA spanning the site of mutation with the 
exception of a G base (orange) targeted for position 1206. The cell's 
endogenous DNA repair process mediates insertion of G at the target 
site, thereby correcting the frameshift mutation and restoring 
UGT1 Al activity. The folded double-hairpin structure containing four 
T residues in each loop, a 5-bp GC clamp, and the modified RNA 
residues significantly improve resistance to nuclease degradation. 
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Fig. 2. Filter lift hybridizations and sequence, analysis of DNA 
from isolated hepatocytes. (A) Representative hybridization patterns 
of duplicate filter lifts of the cloned PCR amplicons with either 
32 P-labeled mutant 1206A or wild-type 1206G 17-mer probes. Hepa- 
tocytes were transfected with vehicle {Left) or UGT1A1 ON (Rigjit). 
(B) The nt sequence of plasmid DNA isolated from clones hybridized 
with probes to mutant 1206A or wild-type 1206G displaying either A 
(arrow, Left) or G (arrow, Right), respectively. 

insertion increased to 23.7% after a second transfection of the 
immortalized Gunn rat hepatocytes. 

We confirmed our results from the filter hybridizations by 
direct sequencing of at least 12 independent clones of the 
wild-type and mutant genes (Fig. 2B). The results indicated 
that colonies hybridizing to only 1206 A exhibited the mutant 
sequence. In contrast, those colonies derived from UGT1A1- 
transfected Gunn rat hepatocytes hybridizing to the wild-type 
1206G ON probe displayed a G at position 1206. The entire 
379-nt PCR-amplified region of the UGT1A1 gene was se- 
quenced for all of the clones and no alterations other than the 
directed change at the target site was observed. Finally, the 
start and end points of the 379-nt PCR-amplified genomic 
DNA samples corresponded exactly to those of the primers 
used for the amplification process, indicating that the clones 
sequenced were derived from genomic DNA, rather than 
nondegraded chimeric ONs. 

In Vivo Characterization of the Anionic Liposome and PEI 
Delivery Systems. The f luorescently labeled ONs, using either 
PEI or liposomes, were distributed homogeneously throughout 
the liver as early as 2 h after tail vein injection (Fig. 3). In 
contrast, there was only minimal uptake in lung, heart, and 




Kf#M»| Lip2h+ASF 




Fig. 3. In vivo hepatic distribution of fluoresce ntly labeled ONs. 
Rats received 200 /ig of 5' fluoresce in-labeled chimeric ONs encap- 
sulated in anionic liposomes or complexed with PEI by single bolus tail 
vein injection. At the indicated times, their livers were processed and 
examined by confocal microscopy. Lip, liposomes. (Bar = 100 jim.) 
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kidney. Coadministration of ASF, which binds avidly to the 
asialoglycoprotein receptor (27), almost totally inhibited the 
hepatic uptake of the ONs. This finding was associated with 
increased levels of the labeled molecules in the other organs. 
Interestingly, no detectable fluorescence was present in the 
testis even with coadministration of ASF. In animals injected 
with naked fluorescein-labeled ONs, there was almost no 
detectable fluorescence in the liver at 2 h. However, distribu- 
tion to the other organs was similar to that observed when ASF 
was coadministered to inhibit liver uptake (data not shown). 

We then characterized the time course for hepatic disap- 
pearance of the fluorescein label in rats injected with the 
liposome-encapsulated ONs (Fig. 3). No significant change 
was observed until 24 h postinjection when the fluorescence 
began to decline. There was a dramatic decrease throughout 
the liver by 48 h, and by 120-168 h there was only background 
fluorescence. Disappearance of the fluorescein label in the 
other tissues mirrored that observed in the liver. The same 
pattern of distribution and disappearance was observed when 
the ONs were complexed to PEL 

In Vivo Correction of the Hepatic UGT1A1 Mutation. Chi- 
meric ONs complexed with PEI or encapsulated in the anionic 
liposomes were administered in vivo by tail vein injection. 
Random samples of liver were harvested at 7 days and 4 and 
6 months postinjection. Liver DNA was isolated, and the 
379-nt sequence spanning the target site was PCR-amplified. 
Duplicate filter lifts of the transformed colonies were hybrid- 
ized with the 17-mer ONs to either wild-type 1206G or mutant 
1206A. Insertion frequency of G at the genomic target site was 
^20% with either delivery system (Fig. 44) and was unde- 



A Vehicle 

* * 1206A , % 

(mutant) , . •* 

(»*)typ»} 



B 




wild type ♦ r_ 

T O A At B1 C6T O Q A A A T Q ACT OC CO AT 




UGT1A1 
120CA ' 

iV.-; 



12060 " ' 



Gunn (vghfcte) I 

fOAATarCCTaAAATftACTOCCaATO 



Gunn treated £ 



T OA A T Q I CCTO AA Af T M ACT OC CA AT 




Fig. 4. Filter lift hybridizations, restriction fragment length poly- 
morphism, and sequence analysis of DNA isolated from liver. (A) 
Hybridization patterns of duplicate filter lifts of the cloned PCR 
products from liver DNA of Gunn rats 6 months postinjection with 
vehicle (Upper) or UGT1A1 ONs (Lower). (B) PCR amplicons were 
subjected to ffr/NI restriction enzyme digestion and analyzed by 
agarose gel electrophoresis and ethidium bromide staining (Top). 
Direct DNA sequencing of the PCR-amplified UGT1A1 gene sur- 
rounding the targeted G insertion site at position 1206 (arrow) is 
shown for wild-type (G, top sequence), vehicle (A, middle), and 
UGTlAl-treated Gunn rats (A and G. bottomV The size of ihe DNA 
standards is indicated at top left. 



tectable in the control groups. The frequency remained stable 
at =«20% even when the same livers were analyzed 4 and 6 
months postinjection (Tabic 1). The PCR amplicons from the 
6-month samples were subjected to restriction endonuclease 
digestion with BstNl. Agarose gel analysis indicated partial 
cleavage at the wild-type BstNl site, whereas DNA from the 
vehicle controls remained resistant (Fig. 4B, Top). Finally, the 
379-nt PCR-amplified DNA fragments were sequenced to 
confirm G insertion. Amplicons from the UGT1A1 livers 
exhibited a mix of wild-type G and mutant A at position 1206 
(Fig. 4B, Bottom), whereas the control groups displayed only 
the mutant A (Middle). 

Southern and Western Blot Analyses. DNA was isolated 
from a variety of liver tissue samples for genomic Southern blot 
analysis. In fact, DNA isolated from animals that were admin- 
istered the UGT1A1 ON showed partial restoration (~25%) 
of the ifc/NI restriction site in exon 4 of the UGT1A1 gene (29) 
(Fig. 5/4). In contrast, the control samples showed no cleavage 
with BstNl at this site, whereas the. wild-type DNA was 
completely cleaved. The results from the Southern blot anal- 
yses were similar for both the PEI and liposomal delivery 
systems. 

Total and microsomal proteins were isolated from liver 
tissue samples and subjected to Western blot analysis for 
detection of the 52-kDa UGT1A1 protein. The results (Fig. 
SB) indicated that repair of the UGT1A1 gene sequence was 
associated with appearance of the bilirubin-conjugating en- 
zyme. In contrast, there was no detectable UGT1A1 protein. in 
control samples. The protein was enriched in the microsomal 
fraction and expressed at 8-15% of wild-type levels, in agree- 
ment with the observed enzyme activity in these samples. 

Effect of UGT1A1 Gene Correction on Serum Bilirubin 
Levels. The serum bilirubin levels of the Gunn rats were 
monitored after tail vein injection and indicated that a single 
dosing regimen of the UGT1A1 molecule, using either PEI or 
liposomes, resulted in an ^25% decrease in serum bilirubin 
levels (Fig. 6). In contrast, rats administered vehicle or non- 
specific ON showed no change, or even an increase in their 
serum bilirubin levels. A repeat dosing with UGT1A1 resulted 
in a further drop in serum bilirubin to <50% of the pretreat- 
ment levels, whereas no significant change was observed in the 
control rats. Blood studies for routine liver enzymes were 
performed with both delivery systems, and no changes were 
detected. Moreover, histologic examination of the livers 6 
months after administration indicated that neither PEI, an- 
ionic liposomes, nor the chimeric ONs altered liver morphol- 
ogy (data not shown). 

Hepatic UGT1A1 enzyme activity was confirmed by bile 
duct cannulation and HPLC analysis of bilirubin glucuronida- 
tion. In fact, bilirubin mono- and diglucuronides were detected 
only in those Gunn rats that were administered the UGT1A1 
chimeric ONs (Fig. 7). No significant differences were detected 
between the PEI and liposomal delivery systems, and in both 
groups the bilirubin was conjugated primarily as the mono- 
glucuronidated species. Only unconjugated bilirubin was 
present in the bile of the control Gunn rats. 

Table 1. In vivo G insertion at nucleotide 1206 of the UGT1A1 
gene in Gunn rat livers 



Vehicle 


UGT1A1 
dosage, mg 




Insertion, % 




1 week 


4 mos 


6 mos 


Liposomes 


1 


20.5 ± 6.1 


17.3 ± 5.1 


19.9 ± 3.0 


PEI 


1 


23.0 ± 1.4 


19.3 ± 2.1 


20.7 - 0.3 


PEI control 


0 


n.d. 


n.d. 


n.d. 



1 IIC UtllU IW^ltiWIll IIIV 1 1 It. til I ^IVblllUi^k. — « *••--• 

random liver tissue samples determined by filter lift hydridizations as 
described in Materials ami Mcihoth. Each treatment croup contained 
.it le:isi three animals, ii.J.. tmt detectable. 
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Fig. 5. Southern and Western blot analyses of Gunn rat livers. 
Liver tissue was harvested for DNA and protein analysis 6 months after 
in vivo administration of the UGT1A1 ONs as described in Materials 
and Methods. (A) Southern blot analysis after sequential digestion of 
genomic DNA with EcoRI and BstNl. (B) Western blot analysis of 
total liver homogenate and microsomal extracts from the UGT1 Al- 
and vehicle-treated Gunn rats. DNA size markers and protein molec- 
ular mass are indicated at left. 

DISCUSSION 

Chimeric RNA/DNA ONs have been used successfully for 
single nt substitution in episomal and genomic DNA of rep- 
licating cells (13, 14, 20, 32). They also have mediated efficient 
genomic site-specific nt exchange in isolated nonreplicating as 
well as quiescent hepatocytes in vivo (21, 22). The purpose of 
this study was to establish whether chimeric ONs could effect 
site-specific replacement of a G residue to correct the frame- 
shift mutation in exon 4 of UGT1A1 in Gunn rats. Our results 
demonstrate efficient correction of the genetic lesion in both 
immortalized and primary Gunn rat hepatocytes, as well as the 
liver in situ. In addition, the long-term change together with 
our previous results with mutation of the rat factor IX gene 
(22) suggest that correction of the UGT1A1 gene was perma- 
nent. 

The genomic insertion of G at the targeted site was not an 
artifact of PCR amplification, as recently suggested (33). 
Specifically, neither the control groups nor DNA samples 
spiked with the UGT1A1 ONs yielded wild-type clones. Also, 
despite the almost complete hepatic disappearance of the ONs 
from the liver by 48 h, the frequency of G insertion at 1 week 
was comparable with that observed 4 and 6 months later in the 
same livers. Furthermore, hepatic correction of the UGT1A1 
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Fig. 6. Effect of UGT1A1 gene correction on serum bilirubin 
levels in Gunn rats. Animals were administered UGT1A1 (blue 
squares) or nonspecific (red circles) ONs complexed to PEI or 
encapsulated in anionic liposomes as described in Materials and 
Methods. The dosage was repeated for all groups 30 days after the final 
injection of the first series (arrow). Each data point is the mean ± SD 
of 1 1 animals. There was no significant difference between the PEI and 
anionic liposome groups. P < 0.001 £14 days for UGT1AI ONs. 



Control 




Retention time (min) 

Fig. 7. HPLC analysis of bile pigments from Gunn rat livers. Bile 
ducts were cannula ted and bile collected for HPLC analysis from both 
PEI- and liposome-treated Gunn rats as described in Materials and 
Methods. BMG, bilirubin monoglucuronide; BDG, bilirubin diglucu- 
ronide; UCB, unconjugated bilirubin. The HPLC profiles are repre- 
sentative of four animals in each experimental group. 

gene mutation was confirmed by genomic Southern blot 
analysis, expression of the 52-kDa UGT1A1 enzyme, and a 
significant reduction in serum bilirubin levels that has been 
maintained as long as 10 months without additional treatment. 
In contrast, serum bilirubin remained unchanged, and in some 
cases increased in control animals. Finally, UGT1A1 enzyme 
activity was confirmed by the appearance of both mono- and 
diglucuronidated bilirubin in the Gunn rat bile (1). 

The reduction in serum bilirubin levels was gradual and 
more closely resembled that observed with hepatocyte trans- 
plantation (23) rather than whole organ transplantation or 
overexpression of UGTIA1 transgenes (8, 10, 11). This finding 
may be explained by partial correction of the enzyme defect 
and slow release of bilirubin from the body stores of the Gunn 
rats, as well as zonal differences in hepatic UGT1A1 expres- 
sion. The greater proportion of bilirubin monoglucuronide 
relative to the diglucuronide in bile is also reminiscent of 
partial bilirubin UGT deficiency states, including CN syn- 
drome type II, Gilbert syndrome, and heterozygous Gunn rats 
(34, 35). The presence of a higher concentration of bilirubin 
relative to the number of UGT molecules favors the generation 
of bilirubin monoglucuronide over the formation of the diglu- 
curonide (36). Also, even with partial gene correction, in- 
creased enzyme expression could be achieved by transcrip- 
tional induction of UGT1A1 with several different agents, 
including phenobarbital (5). 
Based on the in vivo fluorescent studies, we estimate that 
100,000 ONs were delivered to hepatocyte nuclei with each 
tail vein injection. If both alleles are equally amenable to gene 
repair and only a minority of them are repaired, it is more 
probable that a single allele would be corrected than both 
alleles in a given cell. Consistent with this notion, it was 
reported recently that repair of the tyrosine missense mutation 
in albino melanocytes appears to occur in a single allele in 
clonal isolates passaged as many as 10 times (32). This occur- 
rence could be important in some diseases such as ct\- 
antitrypsin deficiency, in which a codominant mutant protein 
may interfere with the function of the wild-type gene product 

( 37 >* • u 

The incorporation of 2'-0-methy!ated RNA residues in the 

structure of the chimeric ON increases the efficiency of nt 

exchange compared with all-DNA ONs (13. 14. 16). It appears 
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that the RNA/DNA strand of the duplex is responsible for the 
initial pairing event, whereas the mismatch within the all-DNA 
homology strand activates the endogenous DNA repair pro- 
cess (17). In fact, the human recombinase HsRec2 protein, 
which facilitates homologous pairing (38), significantly in- 
creased joint molecule formation between the RNA/DNA 
ONs and complementary single-stranded DNA compared with 
the all-DNA ONs (18). 

With a novel bacterial test system, the functional capacity of 
these chimeric molecules to promote targeted nt conversion 
was shown to require both RecA recombinase and MutS, a 
mismatch repair enzyme (19). The human MutS homolog 
MSH2 also was required for nt conversion in a mammalian 
cell-free assay system (39). A two-step process was proposed 
in which RecA mediates strand pairing and formation of a 
double D-Ioop, whereas MutS mediates genomic repair (19, 
39). In fact, MutS is active in mismatch repair pathways rather 
than in homologous recombination (40-42). It recently has 
been reported that the MutSa and MutSj3 heterodimeric 
complexes of the mammalian MSH2 mismatch repair pathway 
are differentially expressed in cultured cells, and that the 
MSH2 protein is involved in modulating their levels (43, 44). 
Thus, cell lines with varying concentrations of these repair 
molecules may respond differently to genomic alteration by 
RNA/DNA ONs (45). 

The use of RNA/DNA ONs to correct genetic diseases of 
the liver offers significant advantages over viral-mediated 
transgene expression. In particular, it overcomes the random 
genomic integration associated with certain viral vectors, and 
the observed immunogenicity and lack of persistent gene 
expression in others. However, the approach does require the 
use of ONs that are designed specifically to each genetic 
mutation. The percent decrease in serum bilirubin levels 
achieved in this study would be sufficient to convert potentially 
lethal CN syndrome type I to a manageable CN syndrome type 
II phenotype. Additionally, the cumulative effect of the re- 
peated treatments coupled with the ability of this technology 
to induce site-specific nt alteration of genomic DNA without 
selection offers a potentially powerful technique for both ex 
vivo and in vivo gene therapy. 
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A Ligase-Mediated Gene Detection Technique 

Ulf Landegren, Robert Kaiser, Jane Sanders, Leroy Hood 

An assay for the presence of given DNA sequences has been developed, based on the 
ability of two oligonucleotides to anneal immediately adjacent to each other on a 
complementary target DNA molecule. The two oligonucleotides are then joined 
covalently by the action of a DNA ligase, provided that the nucleotides at the junction 
are correctly base-paired. Thus single nucleotide substitutions can be distinguished. 
This strategy permits the rapid and standardized identification of single-copy gene 
sequences in genomic DNA. 



DNA ANALYSIS IS ATTAINING IN- 
crcasing importance for the diag- 
nosis of disease caused by single- 
gene defects as well as for the detection of 
infectious organisms (J). Moreover, a num- 
ber of genes, predominandy those encoded 
in the major histocompatibility complex, 
have been found to be associated with an 
increased susceptibility to a variety of dis- 
ease states (2). Of a total of approximately 
2000 defined human genetic loci (J), ap- 
proximately 100 have currendy been studied 
at the DNA level for their role in genetic 
disease (4). A number of genetic diseases are 
caused by alleles present in the population at 
relatively high frequencies, perhaps because 
of selective advantages to the heterozygous 
carriers (5). The ongoing characterization of 
disease-causing or disease-associated gene 
sequences makes large-scale screening for 
carrier status and genetic counseling a possi- 
bility. It may also sharpen the diagnostic 
accuracy for diseases such as autoimmune 
conditions where the susceptibility may be 
influenced by defined alleles. Such prospects 
arc currently limited by the cumbersome 
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nature of the available DNA detection meth- 
ods. 

A majority of polymorphisms in the hu- 
man genome are caused by point mutations 
that involve one or a few nucleotides. Cur- 
rent DNA analysis procedures capable of 
detecting the substitution of a single nucleo- 
tide are based on differential denaturation of 
mismatched probes as in allele-specific oli- 
gonucleotide hybridization (6) or denatur- 
ing gradient gel electrophoresis (7). Alterna- 
tively, the sequence of interest can be inves- 
tigated for polymorphisms that affect the 
recognition by a restriction enzyme (8) or 
that will allow ribonuclease A (RNase A) to 
cleave at mismatched nucleotides of an RNA 
probe hybridized to a target DNA molecule 
(9). Although denaturing gradient gel or 
RNase A can survey long stretches of DNA 
for mismatched nucleotides, they are esti- 
mated to detect only about half of all muta- 
tions that involve single nucleotides (7, 9). 
Similarly, less than half of all point muta- 
tions give rise to gain or loss of a restriction 
enzyme cleavage site (10). The only existing 
technique capable of identifying any single 
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nucleotide difference, short of DNA se- 
quence analysis, is allele-specific oligonucle- 
otide hybridization. This technique involves 
immobilizing separated (6) or enzymatically 
amplified (11) fragments of target DNA, 
hybridizing with oligonucleotide probes, 
and washing under carefully controlled con- 
ditions to discriminate single nucleotide 
mismatches. 

We have devised a strategy that permits 
the facile distinction of known sequence 
variants differing by as lirde as a single 
nucleotide. The approach combines the abil- 
ity of oligonucleotides to hybridize to the 
sequence of interest and the potential of a ' 
DNA-specific enzyme, T4 DNA ligase, to 
distinguish mismatched nucleotides in a 
DNA double helix (Fig. 1). Two oligonu- 
cleotide probes are permitted to hybridize to 
the denatured target DNA such that the 3' 
end of one oligonucleotide is immediately 
adjacent to the 5' end of the other. The 
ligase can then join the two juxtaposed 
oligonucleotides by the formation of a phos- 
phodiester bond, provided that the nucleo- 
tides at the junction are correctly base-paired 
with the target strand. The ligation event 
thus positively identifies sequences comple- 
mentary to the two oligonucleotides. A het- 
erozygous sample is therefore scored as posi- 
tive for both alleles. The joining of the 
oligonucleotides may be conveniently dem- 
onstrated, for instance, by labeling one of 
the oligonucleotides with biotin and the 
other one with 32 P. After the ligation reac- 
tion, the biotinylated oligonucleotides are 
allowed to bind to streptavidin immobilized 
on a solid support. Radioactive oligonucleo- 
tides that have become ligated to biotinylat- 
ed oligonucleotides remain on the support 
after washing and are detected by autoradi- 
ography. 

The gene encoding human 0 globin was 
selected as a model system to test the tech- 
nique. There are two relatively frequent 
alleles, p s and 0 C , each differing from the 
normal allele, {3 A , by a single nucleotide 
substitution in positions 2 and 1, respective- 
ly, of codon six (Figs. 2 and 3) (12). Sub- 
jects homozygous for the 0 s allele suffer 
from sickle cell anemia. Moreover an in- 
creased risk of sudden death during exertion 
has been observed among individuals het- 
erozygous for p s (13). 

The ligase-mediated gene detection pro- 
cedure was used to distinguish 0 A and 0 s 
genes in equivalent amounts of DNA pre- 
sent in cells, in cloned DNA, and in genomic 
DNA (Fig. 2). One of two synthetic oligo- 
nucleotides (B131 or B132), specific for 
each of the alleles, was used in conjunction 
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with another oligonucleotide (P133) hy- 
bridizing immediately 3' to either of the 
other two oligonucleotides on the target 
DNA strand. All of the synthetic oligomers 
used in this study are 20 nucleotides long. 
The ability of T4 DNA ligase to join the 
variable, 3' nucleotide of the aliele-specific 
oligonucleotides to the 5' terminus of the 
invariant oligonucleotide was assessed by 
capturing any ligated product on streptavi- 
din-agarose beads. The beads were filtered 
and washed to remove unbound oligonucle- 
otides, and then the filter with trapped 
beads was exposed to x-ray film. The 10 6 
nucleated cells used for one assay were ob- 
tained from -0.5 ml of blood. The cells 
were used in the assay without DNA purifi- 
cation, by first making the DNA accessible 
for the ligase-mediated analysis by sequen- 
tial additions of a nonionic detergent (Tri- 
ton X-100) and a protease (trypsin). The 
DNA was denatured with alkali and then 
soybean trypsin inhibitor was added to pre- 
vent proteolysis of the added ligase. 

The described ligation reactions were per- 
formed at 37°C, -25 K below the melting 
temperature of the hybridized oligonucleo- 
tides, permitting the use of standardized 
assay conditions independent of the particu- 
lar sequence investigated. The observed 
specificity is a consequence of the require- 
ment for the simultaneous hybridization of 
both oligonucleotides in a precisely juxta- 
posed position. Although both oligonucleo- 
tides are likely to hybridize to numerous 
sequences in the DNA sample, they are 
unlikely to do so in the appropriate head- to- 
tail fashion except where the proper target 
sequence is present. In addition, we have 
found that the ligation reaction requires that 
the two terminal nucleotides on either side 
of the junction of the two oligonucleotides 
be engaged in correct base-pairing. This 
requirement further suppresses incorrect li- 
gation events. 

To determine whether any type of single 
nucleotide mismatch could be distinguished 
from correct base-pairing with the present 
method, we used four synthetic target mole- 
cules representing a segment of the 0-globin 
gene, each with a different nucleotide in the 
first position of the sixth codon. Two of the 
sequences are derived from the p A and 0 C 
alleles of the 0-globin gene. The other two 
sequences represent the other possible nu- 
cleotides occupying the variant position. 
Four pairs of oligonucleotides were de- 
signed to specifically identify one of the 
target molecules. Four oligonucleotide 
probes, each with a different nucleotide in 
the 3' terminal position and complementary 
to one of the target molecules, were sepa- 
rately assayed for their ability to be ligated 
to an invariant oligonucleotide that hybrid- 
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Fig. 1. A diagram depicting gene detection through the ligation of hybridized oligonucleotide probes. 
Target DNA is denatured and mixed with oligonucleotides and ligase. The ligase joins pairs of 
oligonucleotides annealed head to tail if they are correctly base-paired at the junction. Radioactivdy 
labeled oligonucleotides (*) are immobilized and detected by autoradiography only if ligated to 
biotinylated oligonucleotides (B) that can be bound to streptavidin on a solid support' 
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Fig. 2. (a) Nucleotide sequence and correspond- 
ing translated sequence of the pligonudeotides 
used in the analysis described in (b). (b) Analysis 
for the presence of the globin 0 A or &r allele in 
samples containing equal numbers of copies of 0- 
globin alleles present in nucleated cells, in cloned 
DNA, and in genomic DNA. Two x 10 6 linear- 
ized plasmid molecules containing the 0 A or 0 s 
allele of the human globin genes were added to 
individual microtitcr wells containing 10 jxg of 
salmon sperm DNA in 4 p.! of water (19). The 
microliter plates were centrifuged and the super- 

natants removed. To the resuspended cell pellet was added 1 uJ of 10% Triton X-100 and 1 fil of trypsin 
at 2 \igftil The samples were incubated at 37°C for 30 min and were denatured with alkali as above. The 
pH was neutralized and 1 uJ of soybean trypsin inhibitor (Sigma, 10 u-g/ftl) was added. Each well 
received 140 finol of biotinylated oligonucleotides B131 or B132 (20), specific for the globin 0 A and 0 s 
genes, respectively, and 1.4 fmol of oligonucleotide P133, 5' end-labeled with [7- 32 P] adenosine 
triphosphate (ATP) and polynucleotide kinase to a specific activity of 5 x 10 s Cerenkov cpm/jig and 
purified over a Nensorb column (Du Pont Biotechnology Systems). T4 DNA ligase (0.05 Weiss unit, 
Collaborative Research) was added in 2 of 5 x ligase buffer to a final volume of 10 \il containing 50 
mM tris-HCl (pH 7.5), 10 mM MgCi 2 , 150 mM NaCl (including 50 mM added during denaruration), 
1 mM spermidine, 1 mM ATP, 5 mM dithiothreitol, and 100 ng of bovine serum albumin per 
microliter. The reagents were mixed by briefly centrifuging the microtiter plates before incubating at 
37°C and 100% humidity for 5 hours. The ligated oligonucleotides were denatured by the addition of 1 
Mi of 1.1M NaOH and incubated for 10 min at 37°C. After the incubation, 1 uJ of 1.1M HC1 and 2 p.1 
of 10% SDS were added. Three microliters of a 15% (v/v) suspension of strcptavidin-coated agarose 
beads (Bethesda Research Laboratories) was then added, and the plate was incubated on a shaking 
platform at room temperature for 5 min. The contents of the wells were transferred to a dot blot 
manifold (Schleicher and Schuell) with a Whatman filter paper no. 4. In order to reduce nonspecific 
binding of the labeled oligonucleotides, the filter papers had been boiled and the beads diluted in 0.5% 
(v/v) dry nonfat milk, 1% SDS, and salmon sperm DNA (100 ugfrnl). The beads (21) were washed 
under suction in the manifold with 3 ml of 1% SDS and 1 ml of 0. 1M NaOH per sample, with a 96-tip 
dispenser (Vaccu-pette/96, Culture Tek). The filters were wrapped in plastic wrap and autoradio- : 
graphed for 3 days at -70°C with one enhancing screen (Du Pont). 
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izcd immediately 3' to the first oligonucleo- 
tide. These reagents permit studying the 
effect on ligation by any of the 16 possible 
base pairs, the 4 correct Watson- Crick pairs 
and 12 mismatched pairs, in an invariant 
sequence context. Under the appropriate 
conditions, only nucleotides engaged in cor- 
rect base-pairing were efficiently joined by 
ligation (Fig. 3). Parameters that affected 
the nucleotide specificity were the salt con- 
centration and the amount of enzyme added 
relative to the DNA concentration. Higher 
salt concentration and lesser amounts of en- 
zyme than those found to be optimal for 
discrirninarion resulted in loss of signal. The 
above experiment cannot exclude the possibil- 
ity that the identification of mismatched nu- 
cleotides may be influenced by the surround- 
ing sequence, although we have not yet en- 
countered any evidence for such effects. 

Although autoradiographic techniques 
are relatively simple to implement, a gene 
detection assay based on the use of fluores- 
cent rather than radioactive probes would 
have the advantages of safe handling, more 
stable reagents, and rapid access to the re- 
sults, and would allow for multicolor analy- 
sis by using fluorophores with different 
emission spectra. In general, conventional 
organic fluorophores are less sensitive labels 
than 32 P. Thus we increased the amount of 
target DNA before the detection assay with 
the polymerase chain reaction (14). With 



this procedure a segment of DNA can be 
exponentially amplified by repeated cycles of 
enzymatic synthesis of new strands from 
two oligonucleotide primers, one with a 
sequence derived upstream and the other in 
the opposite orientation downstream of the 
segment of interest. Genomic DNA was 
obtained from three human cell lines, 
MOLT-4, which is homozygous for the p A - 
globin allele; SOI, homozygous for the p s 
allele; and GM2064, in which the 0-gIobin 
locus has been deleted (15). The appropriate 
segment of the p-globin gene was amplified 
in 25 cycles from 1 u,g of genomic DNA 
from each cell line. We used 3-u.I aliquots, 
equivalent to 24 ng of genomic DNA for 
the assay. Two oligonucleotides, specific for 
the 0 A and 0 s alleles and differentially 5'- 
iabeled with one of two fluorophores, were 
present at equal concentrations. The amount 
of each of these oligonucleotides that be- 
came ligated to a third oligonucleotide hy- 
bridizing downstream of the other two was 
determined by separating the reaction prod- 
ucts on an 8% polyacrylamide gel and ana- 
lyzing the band migrating as a 40-nucleotide 
oligomer (the size of two ligated oligonucle- 
otides) for the relative contribution by the 
two different fluorophores [model 3 70 A 
DNA sequencer, Applied Biosystems, Fos- 
ter City, California (16)]. No signal was 
observed when the. p-globin gene had been 
deleted in the cell from which the DNA was 



obtained, whereas only the correct fluoro- 
phore-labclcd oligonucleotide was ligated 
when the cells harbored the p A or P s alleles 



B128 (8 A ) 5 1 B CATGGTGCACCTGACTCCTG pAGGAGAACTCTGCCGTTACT 3' P129 

B13A (6 C ) 5' B A 

B136 5' B T 

B137 5' B C 



172 (B A ) 3' 

138 (B C ) 3' 

139 3' 

Ho 3' 

Fig. 3. (a) Nucleotide sequence of the oligonucle- 
otides used in the analysis described in (b). (b) 
Correct identification of four target molecules, 
differing by single-nucleotide substitutions in one 
position. Letters refer to the variable nucleotides 
in the probe and target sequences. As target 
molecules, 40-nucleotide oligomers, derived from 
the p-globin gene sequence, were synthesized. 
The oligonucleotides 172, 138, 139, and 140 are 
of identical sequence except in a central position 
where each target molecule includes a different 
nucleotide. Four 20-nuclcotidc biotinylatcd 
oligomers, B128, B134, B136, and B137, differ- 
ing only in their 3' nucleotide position, were 
designed to hybridize to the 3' half of the target 
molecules such that the variant position of die 
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probe reagents corresponds to that of the target molecules. Each of the biotinylated oligonucleotides 
was used in conjunction with oligonucleotide Pi 29, 5' end-labeled with 32 P and hybridizing 
immediately 3' to the biotinylated probes on the target strands. The assays were performed essentially as 
described in the legend to Fig. 2, but 2 x 10 8 copies of one of the target molecules were added to each 
well with 10 u.g of salmon sperm DNA. Each well further received one of the biotinylated 
- oligonucleotides together with oligonucleotide PI 29. The final NaQ and ligase concentrations were 
varied as indicated. 




20 ( 30 40 

Length in nucleotides 

Fig. 4. Demonstration of the presence of the p A 
and p s alleles of the p-globin gene in amplified 
genomic DNA by probes labeled with fluorescent 
dyes. A 120-bp segment of the p-globin gene was 
amplified with the polymerase chain reaction as 
described (16) in 25 cycles starting with 1 jig of 
genomic DNA from the cell lines MOLT-4, SC- 
1, and GM2064 0" A , p 8 * and p 0 *, respective- 
ly) in 100 pJ. Three microliters of each amplified 
sample was added to an Eppendorf tube, dena- 
tured by alkali, neutralized, and incubated with 
14 finol each of oligonucleotide 131 labeled with 
carboxy-fluorescein (Molecular Probes) (CF131) 

( ) and oligonucleotide 132 labeled with 

carboxy-2\7'-a^cmoxy-4\6'-dichloron^ 

(CD132) ( ), and 14 finol of nonradioacti- 

vely 5' phosphorylated oligonucleotide P133 (for 
sequences, see Fig. 2). The reaction conditions 
were essentially as described in Fig. -2, but 0.5 
Weiss unit of T4 DNA ligase was added to each 
assay. At the end of the 3-hour incubation, the 
samples were ethanol precipitated, taken up in 
50% formamide, and loaded on a sequencing gel 
in an ABI 370A automated DNA sequencer. The 
fluorescence signal was processed to distinguish 
the partially overlapping emission spectra of die 
two fluorophores and to determine the relative 
contribution of each fluorophore to the signal. 
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(Fig. 4). This strategy could be generalized 
to the simultaneous analysis of several loci. 
For each set of two labeled, ailele-specific 
oligonucleotides and one unlabeled, the lat- 
ter is given a nonhybridizing 3' sequence 
extension of a unique length. This results in 
different migration rates for the ligation 
products, characteristic of each locus. 

In contrast to gene detection techniques 
based on immobilizing the target DNA, 
such as DNA blots, the hybridization re- 
ported here was performed in solution and 
in a small volume, which reduced the time 
required for hybridization (17). It also obvi- 
ated the step of immobilizing the target 
DNA. Both ligation and binding of the 
biotinylated oligonucleotides are efficient 
and rapid steps that should permit quantita- 
tive detection of target molecules. In gener- 
al, there are three rate-limiting steps in gene 
detection techniques. The first is sample 
preparation, which can be greatly simplified 
as demonstrated here. The second is the 
time required for the probes to anneal to the 
target sequence. This is a function of the 
concentration of the probe and can be re- 
duced considerably. The third and most 
time-consuming step in the present tech- 
nique is signal detection by autoradiogra- 
phy. A sufficiently sensitive fluorescent de- 
tection method (18) should drastically re- 
duce this time, permitting the development 
of a rapid, automated gene detection proce- 
dure. 
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Amyloid Protein Precursor Messenger RNAs: 
Differential Expression in Alzheimer's Disease 

M. R. Palmjert, T. E. Golde, M. L. Cohen, D. M. Kovacs, 
R. E. Tanzi, J. F. Gusella, M. F. Usiak, L. H. Younkin, 
S. G. Younkin* 

In situ hybridization was used to assess total amyloid protein precursor (APP) 
messenger RNA and the subset of APP mRNA containing the Kunitz protease 
inhibitor (KPI) insert in 11 Alzheimer's disease (AD) and 7 control brains. In AD, a 
significant twofold increase was observed in total APP mRNA in nucleus basalis and 
locus ceruleus neurons but not in hippocampal subicular neurons, neurons of the basis 
pontis, or occipital cortical neurons. The increase in total APP mRNA in locus ceruleus 
and nucleus basalis neurons was. due exclusively to an increase in APP mRNA lacking 
the KPI domain. These findings suggest that increased production of APP lacking the 
KPI domain in nucleus basalis and locus ceruleus neurons may play an important role 
in the deposition of cerebral amyloid that occurs in AD. 



Alzheimer's disease (AD) is 
characterized pathologically by large 
numbers of senile plaques and neu- 
rofibrillary tangles throughout the cerebral 
cortex and hippocampus. Senile plaques 
consist of clusters of degenerating neurites 
surrounding an amyloid core composed of 
5- to 10-nm fibrils that stain metachromati- 
cally with Congo red. In many cases of AD, 
amyloid fibrils are also found in vessel walls 
(7). A 4.2-kD polypeptide, referred to as A4 
or the p protein, has been isolated from the 
amyloid fibrils found in senile plaques (2) 
and vessel walls (3) of patients with AD. 
There is evidence that A4 may also be a 
component of the paired helical filaments 
found in neurofibrillary tangles (4). 

The gene encoding A4, which is located 
on chromosome 21 (5), produces at least 
three mRNAs (Fig. 1) referred to as APP 69 5> 
APP751, and APP no (6-5). APP 695 , the 
mRNA that was initially identified (5), en- 



codes an amyloid protein precursor (APP), 
695 amino acids in length, that includes A4 
at positions 597 to 638. APP751 is identical 
to APP 693 , except for a 168-nucleotidc in- 
sert (6-8). This insert, previously referred to 
as HL124i (7), would introduce 56 amino 
acids carboxyl tenninal to Arg 288 and con- 
vert Val 289 into an isoleucine. APP770 is 
identical to APP751, except for a 57-nucleo- 
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In a novel method for analysing mutations, allele specific 
oligonucleotides (ASOs) are synthesised in stripes on the surface 
of a glass plate and single-stranded 35 S labelled RNA probes 
applied in orthogonal stripes. We have tested the approach using 
the well studied example of the sickle cell mutation in the human 
0-globin gene. 

Detection of sequence variation in DNA has applications in 
linkage analysis, in the analysis of inherited diseases, in genetic 
fingerprinting, and in studies of evolution. As full sequence 
analysis is time consuming and expensive, several methods have 
been developed to analyse variation directly. Some are based on 
the hybridisation of short oligonucleotides to the test sequence 
(1). One advantage of this approach is that it can detect both the 
mutant and the wild-type sequence in a single analysis. In others 
the oligonucleotides may be bound to a solid support and probed 
with the labelled test sequence (2, 3). 

In some applications, such as the analysis of a common 
mutation as may be the case in the haemoglobinopathies (4), there 
is a need to analyse many samples with a few oligonucleotides; 
in others, for example in linkage analysis using ASOs instead 
of RFLPs (5), there is a need to analyse a few samples with many 
oligonucleotides; and in yet others, for example population 
screening for mutations in genes such as the CFTR gene with 
many alleles (6), there is need to analyse many samples with many 
oligonucleotides. This communication describes a versatile 
approach which can be adapted to any of these applications. 

Our method for synthesising oligonucleotides on glass plates 
(7, 8) was used to produce stripes of ASOs 15 nt long for the 
A, C and S alleles of the £-globin sickle cell locus. The stripes 
were 2 mm wide and 150 mm long made using the device shown 
in Figure 1 on 3 mm thick window glass. 

Four different single-stranded RNA probes covering the site 
of the /?-globin mutation were prepared as follows: Carrier and 
patients' DNAs (a gift from Dr J.OId, John Radcliffe Hospital, 
Oxford) and wild-type control were amplified using a standard 



PCR procedure (25 cycles; 55°C, 2 min; 72°C, 2 min; 94°C, 
2 min; Cetus PCR machine), Figure 2, to give a 162 bp product. 
The 46 nt upstream primer consisted of 20 nt of £-globin sequence 
and a 26 nt T7 polymerase promoter clamp at the 5' end, and 
the downstream primer of 20 nt of 0-globin sequence and a 26 
nt SP6 RNA polymerase clamp to allow separate transcription 
of either the 'sense' or 'anti-sense* strand. 
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Figure 2. Strategy used to generate a labelled single-stranded RNA probe from 
genomic DNA. The primers were: upstream primer, TTC TAA TAC GAC TCA 
CTA TAG GG A G A ACA CAA CTG TGT TCA CTA GC; downstream primer. 
CTT AAT TAG GTG ACA CTA TAG AAT AG CAA CTT CAT CCA CGT 
TCA CC. Standard PCR buffer containing 2 mM Mg 2 * and AmpliTaq 
polymerase were used. 




Figure 3. Device used to apply solutions in channels orthogonal to the 
Figure 1. Set-up used to synthesise oligonucleotides in lines on a glass plate. oligonucleotide lines. Individual channels were 3.5 mm wide. 
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Transcription was carried out using the Promega Riboprobe 
kit according to manufacturer's instructions, with PO uCi 
(Amersham ) a nd no unJabelled UTP, for 30 min at 
1 c - The mixture was spun through a Sephadex G-">5 STE 
column. Equal amounts of radioactive transcript were used in 
each hybridisation. 

Several probes can be analysed simultaneously by applying 
solutions in stnpes across the stripes of oligonucleotides on the 
surface of the microscope slide. The stripes were formed by 
putting the plate on the device shown in Figure 3, made from 
plexiglass. The hybridisation solution CIO fil, 0 1 M NaCl in TE 
pH 7.5, containing 0.2% SDS) was run into the line of contact 
between the plexiglass and the glass by capillary action 
Hybridisation was for 2 hrs at room temperature. The plate was 
nnsed >n 0 1 M NaCl, eluted at 43°C for 10 min, and exposed 
to a Phosphorlmager storage phosphor screen overnight, scanned 
on a Molecular Dynamics Phosphorlmager and printed (Figure 
4).The results for all individuals are clear and as expected 

The method has several advantages over alternatives. It allows 
for multiple comparisons to be carried out in a sinale simple 
experiment. The number of oligonucleotides that^could be 
synthesised and the number of probes analysed are determined 
by the size of the glass plate and the width of the stripes. On 
a 200 mmx200 mm plate it should be possible to synthesise 100 
drfferent sequences and test probes from 50 different individuals 
This level of highly parallel analysis is potentially a lot higher 
t J* method »*°diiced by Erlich et al. (9) which involves 
the hybndisaaon of one PCR product at a time to oligonucleotides 
UV crosslinked to strips of membrane. 

The manipulations are simple to carry out manually, although 
the method would lend itself to automation. The analysis is rapid- 

DnUZZ" °? P rocedure > with genome 

P to having the final result in less than a working day It 
is versatile and can be applied to any locus for which there is 
sufficient information to produce oligonucleotides for test and 

rS^?' C " g ^ fib "f S or for mutations 

m the p53 gene. The glass plates, unlike fiJters, are stable- we 
have reused them more than thirty times with no loss of 
performance ( The interpretation of the result is straightforward 
and can readily be automated and quantified. 
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RESEARCH ARTICLE 

Detection of Single Base Substitutions by 
Ribonuclease Cleavage at Mismatches 
in RNA:DNA Duplexes 

Richard M. Myers, Zoia Larin, Tom Maniatis 



Physical methods for detecting single 
base substitutions have provided power- 
ful tools for the analysis of human genet- 
ic diseases (1-4) and the establishment of 
human genetic linkage maps (5-7). These 
techniques could also be of considerable 
value in the detection and analysis of 
single base mutations in regulatory or 
protein-coding sequences. Procedures 
available for detecting base substitutions 
rely on differences in restriction endonu- 
clease cleavage sites (5-/2), or on differ- 
ences in the melting behavior of wild- 
type and mutant DNA duplexes (13-21). 
For example, some single base substitu- 
tions result in the loss or gain of a 
restriction endonuclease cleavage site, 
and can therefore be detected in South- 
ern blotting experiments (8-12). Howev- 
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er, it is usually necessary to use a large 
number of different restriction enzymes 
before a change is detected. In addition, 
many substitutions cannot be detected 
by this procedure because they do not 
alter a restriction site. Another approach 
involves the use of synthetic oligodeox- 
yribonucleotides as differential hybrid- 
ization probes (75-76). In this method, a 
labeled synthetic oligonucleotide ho- 
mologous to the mutant or wild-type 
DNA is hybridized to blotted genomic 
DNA. Hybridization or washing condi- 
tions are then adjusted to allow the dif- 
ferential melting of the mismatched and 
perfectly paired duplexes. This method 
is useful for scoring substitutions at spe- 
cific locations, but is not practical for 
screening large regions of DNA for new 
mutations or polymorphisms. 

Differential DNA melting Is also the 
basis for detecting single base substitu- 
tions by denaturing gradient gel electro- 
phoresis (77-27). In this method, wild- 
type and mutant DNA molecules are 
separated by electrophoresis in poly- 



acrylamide gels containing a gradient of 
formamide and urea. Duplex DNA frag- 
ments move through these gels with a 
constant mobility determined by molecu- 
lar weight until they migrate into a por- 
tion of the gel containing a denaturant 
concentration sufficient to melt the 
DNA. When the DNA undergoes melt- 
ing, its electrophoretic mobility abruptly 
decreases. Thus, the final position of a 
DNA fragment in the gel is determined 
by its melting temperature. The differ- 
ence in melting temperature between 
two fragments that differ by a single base 
change is sufficient to allow separation 
on the gel. Even greater separation is 
achieved with DNA duplexes containing 
a single base mismatch (18). With spe- 
cially designed plasmid vectors, virtually 
all possible single base substitutions can 
be detected in cloned DNA fragments 
(79, 20). However, for technical reasons 
(75-27), only 25 to 40 percent of all 
possible substitutions can be detected 
directly in total genomic DNA. 

Because of the limitations in the pro- 
cedures discussed above, we developed 
an alternative method for detecting sin- 
gle base substitutions in cloned and ge- 
nomic DNA. This method involves the 
enzymatic cleavage of RNA at a single 
base mismatch in an RNA:DNA hybrid. 
The strategy used is based on the devel- 
opment of methods for synthesizing 
RNA probes (22-24), and on the obser- 
vation that many ribonucleases are spe- 
cific for single-stranded RNA under ap- 
propriate reaction conditions (25). A 
similar strategy had been developed ear- 
ner to detect mutations in duplex DNA 
containing single base mismatches (26, 
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1 27). In this case, attempts were made to 
cleave DNA:DNA mismatches with the 
single-strand specific nuclease SI. How- 
ever, only a small amount of cleavage 
occurs at a few mismatches while most 
mismatches are not cleaved at all (26, 
27). In this article, we demonstrate that 
many single base mismatches in 
RNArDNA hybrids are cleaved specifi- 
cally by ribonuclease A (RNase A). 

The steps in the RNase cleavage pro- 
cedure are outlined in Fig. 1. A 32 P- 
labeled RNA probe is synthesized from a 
wild-type DNA template with the SP6 
transcription system (22-24, 28). The 
RNA probe is hybridized to denatured 
test DNA (29) in solution, and the result- 
ing RNA:DNA hybrid is treated with 
RNase A (30). The RNA products are 
then analyzed by electrophoresis in a 
denaturing gel. If the test DNA is identi- 
cal to wild-type DNA, a single band is 
observed in the autoradiogram of the gel, 
since the RNArDNA hybrid is not 
cleaved by RNase. However, if the test 
DNA contains a single base substitution 
that results in a mismatch recognized by 
RNase A, two new RNA fragments will 
be detected. The total size of these frag- 
ments should equal the size of the single 
RNA fragment observed with wild-type 
DNA. Thus, the mutation can be local- 
ized relative to the ends of the RNA 
probe by determining the sizes of the 
cleavage products. The end of the RNA 
probe mapping nearest to the substitu- 
tion can be determined when the experi- 
ment is performed with DNA digested by 
an additional restriction enzyme (29), 
thus localizing the substitution unambig- 
uously. 

For convenience, single base mis- 
matches in the RNA:DNA hybrids are 
presented as X: Y, where X and Y desig- 
nate the mismatched RNA and DNA 
bases, respectively. For example, 
"OA" refers to a mismatch in which 
cytosine appears in the RNA strand op- 
posite adenine in the DNA strand. 

Detection of single base substitutions in 
cloned DNA fragments. To establish opti- 
mal conditions for recognizing single 
base mismatches, and to determine 
which types of mismatches can be 
cleaved by RNase, we examined a large 
number of single base substitutions in 
the mouse (3-major globin promoter re- 
gion (21, 31). With this collection, it was 
possible to examine all 12 types of mis- 
matches possible in RNA:DNA hybrids 
in several different sequence contexts. 
The results of several RNase cleavage 
reactions are shown in Fig. 2. The RNA 
probe used in these reactions is comple- 
mentary to the sense strand of the p- 
globin promoter, and therefore is desig- 



Abstract. Single base substitutions can be detected and localized by a simple and 
rapid method that involves ribonuclease cleavage of single base mismatches in 
RNA.DNA heteroduplexes. A n P-labeled RNA probe complementary to wild-type 
DNA is synthesized in vitro and annealed to a test DNA containing a single base 
substitution. The resulting single base mismatch is cleaved by ribonuclease A, and 
the location of the mismatch is then determined by analyzing the sizes of the 
cleavage products by gel electrophoresis. Analysis of every type of mismatch in 
many different sequence contexts indicates that more than 50 percent of all single 
base substitutions can be detected. The feasibility of this method for localizing base 
substitutions directly in genomic DNA samples is demonstrated by the detection of 
single base mutations in DNA obtained from individuals with ^-thalassemia, a 
genetic disorder in $-globin gene expression. 



nated an "antisense" probe. When this 
probe was annealed to the wild-type pro- 
moter fragment and then digested with 
RNase A, a single, full-length RNA frag- 
ment of 186 nucleotides (nt) was ob- 
served (Fig. 2, lane 1). In some experi- 
ments, faint background bands were visi- 
ble in the wild-type lane, indicating that a 
low level of cleavage occurs at bases that 
are not mismatched. In contrast, when 
an RNA:DNA duplex containing a C:A 
mismatch at position -40 in the promot- 
er was analyzed, three bands were ob- 
served (Fig. 2, lane 2). One of these 
'bands, representing about 50 percent of 
the total radioactivity in the lane, corre- 
sponds to the full-length RNA probe. 
The lengths of the other two RNA frag- 
ments correspond to the sizes expected 
for cleavage at the mismatch at position 
-40 in the promoter (66 and 120 nt). In 
this and other mismatches examined, 
one of the RNA fragments (the 66-nt 
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fragment) appears as a doublet on the 
autoradiogram, which is probably the 
result of further reaction of RNase at 
pyrimidines near the ends of the cleaved 
RNA product. 

Similar results were obtained with an- 
other C:A mismatch located at position 
-60 in the promoter (Fig. 2, lane 4). In 
contrast, in the case of a third C:A 
mismatch, occurring at -56 in the pro- 
moter, all of the radioactivity is present 
in the two cleavage products, indicating 
that 100 percent of the mismatches were 
cleaved under the same conditions (Fig. 
2, lane 3). Altogether, 21 different C:A 
mismatches in the promoter were tested, 
and more than 50 percent of each mis- 
match was cleaved by RNase A in every 
case (Table 1). Similar results were ob- 
tained with CrC and C:T mismatches 
(Fig. 2, lanes 5 to 7, and Table 1). In 
contrast, only six of ten U:G mismatches 
in the promoter were cleaved by RNase, 
and the efficiency of cleavage varied 
from 10 to 90 percent (Fig. 2, lanes 8, 9, 
and 11, and Table 1). Three UrC mis- 
matches were tested, and in each case 
cleavage was very inefficient (only 5 to 
10 percent; lane 10 and Table 1). Three 
U:T mismatches in the promoter were 
cleaved at a level of 25 percent (Table 1). 

Fig. 1. Detection and localization of single 
base substitutions by the RNase cleavage 
procedure. A labeled RNA probe is synthe- 
sized with the use of the SP6 transcription 
system. Double-stranded DNA is digested 
with restriction enzymes that cleave outside 
the region covered by the probe and then 
denatured and annealed to a large molar ex- 
cess of the RNA probe. Digestion of the 
hybridization mixture with RNase removes all 
of the unhybridized RNA probe and cleaves 
the specific RNArDNA duplex at the position 
of the mismatched base. The RNase resistant 
products are then size-fractionated by gel 
electrophoresis and detected by autoradiogra- 
phy. In the absence of a mutation, the full- 
length probe fragment is observed. If the test 
DNA contains a single base mutation, cleav- 
age at the resulting mismatch generates two 
RNA fragments whose total lengths are equal 
to that of the probe. 
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Several G:G, G:A, G:T, A:A, A:C, 
and A:G mismatches were tested and no 
cleavage by RNase A was observed in 
most cases (for example, see Fig. 2, lane 
15). However, a small amount (10 to 20 
percent) of cleavage occurred at two 
A: A mismatches and one G:T mismatch, 
and efficient cleavage occurred at three 
A:C and two A:G mismatches (Table 1). 
It is surprising that cleavage occurred at 
these mismatches since RNase A cleaves 
after pyrimidines (25). However, it is 
possible that de stabilization of the mis- 
matched RNA:DNA duplex leads to 
cleavage at nearby pyrimidine bases. 

To determine whether this procedure 
can be used to detect small deletions, we 
analyzed several promoter fragments 
containing different single base dele- 
tions. In each case, nearly complete 
cleavage of the probe was observed at 
the resulting single base "loop-out," or 
at nearby pyrimidines (Fig. 2, lane 12, 
and Table 1). Similarly, RNA:DNA du- 
plexes containing two mismatches in 
close proximity were efficiently cleaved 
in the assay (Fig. 2, iane 13, and Table 
1). 

Detection of thalassemia mutations in 
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Fig. 2. RNase cleavage analysis of singe base 
mutations in a cloned mouse p-major globin 
promoter fragment. A I86-nt amisense RNA 
probe was annealed to wild-type and mutant 
promoter fragments and the resulting 
RNArDNA duplexes treated with RNase A. 
The digestion products were analyzed by 
polyacrylamide gel electrophoresis and auto- 
radiography. The DNA sample analyzed in 
each lane was: wild-type (lane 1); mutant 
-40A (Iane 2); -56A (lane 3); -60A (lane 4); 
-33T (lane 5); -25T (lane 6); -54T (lane 7); 
-57G (lane 8); -31G (lane 9); -62C Oane 10); 
-26G (lane 11); -76 deletion (lane 12); -28G/ 
-26G Oane 13); -25A (lane 14); -49G Oane 
15). The type of mismatch produced by an- 
nealing the wild-type antisense RNA probe to 
each mutant DNA fragment is indicated at the 
top of each lane. 

1244 



cloned and genomic DNA. To establish 
the feasibility of detecting single base 
mutations associated with human genetic 
diseases, we analyzed a number of dif- 
ferent cloned and genomic DNA's bear- 
ing (3-thalassemia or sickle cell anemia 
mutations. In these experiments, the 
RNA probes used were about 615 nt in 
length, spanning the region of the gene 
and 5' flanking sequences from -128 to 
+485 (52). Two RNA probes were syn- 
thesized to test both the sense and anti- 
sense strand of the region. With this set 
of substitutions and probes, 10 of the 12 
types of RNA:DNA mismatches could 
be formed, and 7 out of the 10 types were 
cleaved to some extent by RNase (Table 
1). 

To determine whether the RNase 
cleavage procedure could be used to 
detect single base substitutions in total 
genomic DNA, we analyzed DNA sam- 
ples from two individuals with p-thalas- 
semia. One individual carried a C to T 
transition at codon 39 of the p-globin 
gene in both chromosomes. The second 
individual was homozygous for the he- 
moglobin 0 E (HbE) allele, which con- 
tains a G to A transition at codon 26 in 
the gene. The codon 39 (0°39) DNA was 
tested with the sense strand RNA probe, 
whereas the HbE DNA was tested with 
the antisense RNA probe. Both of these 
hybrids result in C:A mismatches with 
their corresponding probes. When a con- 
trol experiment was performed with the 
sense probe and genomic DNA from an 
individual with wild-type p-globin genes, 
a single band appearing at the full-length 
position resulted (Fig. 3A, lane 1). When 
DNA from the individual homozygous 
for the p°39 ' mutation was analyzed, 
RNA fragments 430 and 185 nt in length 
were observed (Fig. 3A, lane 2), indicat- 
ing that cleavage at the C:A mismatch 
occurred at a high efficiency. Similar 
results were obtained with the analogous 
cloned DNA samples (Fig. 3A, lanes 3 
and 4). In another experiment with the 
antisense RNA probe, genomic DNA 
from an individual with normal (3-globin 
genes also resulted in a single band ap- 
pearing at the full-length probe position 
(Fig. 3B, lane 1). Genomic DNA from a 
patient homozygous for the HbE allele 
resulted in two RNA fragments of the 
expected sizes of 355 and 260 nt (Fig. 3B, 
lane 2), again indicating complete cleav- 
age of the mismatch by RNase A. These 
results were obtained with 3 \ig of total 
genomic DNA, and RNA probes with an 
[a- 32 P]GTP specific activity of 400 Of 
mmol. A signal could be clearly detected 
after a 24-hour autoradiographic expo- 
sure. These experiments therefore estab- 
lish the feasibility of detecting single 



base mutations and linked polymor- 
phisms in genomic DNA with this meth- 
od, at a level of sensitivity at least com- 
parable with existing techniques. 

Analysis of mismatch recognition. We 
find that 4 (C:A, C:C, C:T, and U:T) out 
of the 12 possible types of mismatches 
are recognized efficiently by RNase A in 
all sequence contexts tested. Thus, ap- 
proximately one-third of all possible sin- 
gle base substitutions can be detected 
with an RNA probe homologous to one 
strand of the test DNA. This number can 
be doubled with the use of a second 
RNA probe, homologous to the opposite 
strand of the test DNA. For example, a 
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Fig. 3. RNase cleavage analysis of human 
genomic DNA samples from individuals with 
p-thalassemia. (A) Analysis of the p°39 thal- 
assemia mutation. Genomic and cloned DNA 
from an individual with wild-type p-globin 
genes and an individual homozygous for a 
nonsense mutation in codon 39 were analyzed 
by the RNase cleavage procedure with a 
sense strand RNA probe. An autoradiogram 
of the RNase digestion products is shown. 
Genomic wild-type p-globin DNA (lane 1); 
genomic 0°39 DNA (lane 2); cloned wild-type 
P-globin DNA (lane 3); cloned p°39 DNA 
(lane 4). The sizes of the RNase digestion 
products are indicated. (B) Analysis of the 
HbE thalassemia mutation. Genomic DNA 
from an individual with wild-type p-globin 
genes and an individual homozygous for HbE 
were analyzed by the RNase cleavage proce- 
dure an antisense RNA probe. Genomic wild- 
type p-globin DNA (lane 1); genomic HbE 
DNA (lane 2). 
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G:T mismatch formed between one 
strand of the test DNA and the homolo- 
gous RNA probe may not be cleaved by 
RNase A. However, when the other 
DNA strand is hybridized to its homolo- 
gous RNA probe, the C:A mismatch at 
that same position will be cleaved by 
RNase. Thus, approximately two-thirds 
of all possible single base substitutions 
should be detected. This is clearly a 
minimum estimate, since we have ob- 
served cleavage at seven of the remain- 
ing eight possible types of mismatches in 
some sequence contexts. 

We do not understand why some mis- 
matches can be cleaved in some se- 
quence contexts but not in others. It 
seems likely that differences in accessi- 
bility to cleavage are the result of differ- 
ences in the overall structure of the 
mismatched duplex. However, we have 
not been able to discern a sequence 
pattern surrounding a mismatch that can 
be correlated with the observed efficien- 
cy of RNase cleavage. 

As indicated in Fig. 2 and Table 1, 
some mismatches are only partially 
cleaved in the assay. Our data were 
obtained by performing the RNase reac- 
tions for a fixed length of time (30 min- 
utes). In a time-course experiment, we 
found that many mismatches that are 
only partially cleaved in 30 minutes can 
be cleaved almost to completion in 90 
minutes under the same conditions and 
with only a slight increase in back- 
ground. However, mismatches not 
cleaved in 30 minutes are also not affect- 
ed by longer incubation times. Thus, it 
may be desirable to perform the RNase 
reactions for various lengths of time in 
cases where partial cleavage occurs. 

The temperature and ionic strength of^ 
the solution in which the RNase reaction 
is performed also contribute to the de- 
gree of cleavage and the apparent effects 
of sequence context. In fact, altering the 
reaction conditions to higher tempera- 
ture and lower ionic strength results in 
cleavage at some mismatches that are 
not normally cleaved, and more com- 
plete cleavage of mismatches that are 
normally only partially cleaved. These 
reaction conditions may be desirable in 
some cases, but are not ideal since inter- 
nal cleavage at perfectly matched posi- 
tions also increases significantly. 

The fact that some mismatches are 
never or rarely cleaved by RNase A and 
that partial cleavage sometimes occurs 
led us to test the ability of other ribonu- 
cleases to cleave at mismatches. We 
have not detected any cleavage with 
RNase Tl and RNase T2 under various 
reaction conditions. 
The lack of complete cleavage of some 
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mismatches may pose a difficulty when 
the RNase cleavage procedure is used 
for determining the genotype of a diploid 
genome. In cases where 50 percent or 
less of the RNA probe is cleaved, low 
efficiency of cleavage could be an intrin- 
sic property of the mismatch in question, 
or the individual may be heterozygous 
for the mutant allele. This ambiguity may 
often be eliminated by performing a 
time-course of RNase treatment. Alter- 
natively, as with oligonucleotide probes 
(J, 16, 55), the genotype can be unam- 
biguously determined if probes are avail- 
able for both wild-type and mutant al- 
leles. Thus, it should be possible to use 



this method for prenatal diagnosis of 
genetic diseases. Partial cleavage at mis- 
matches is not a problem when mapping 
mutations in cloned DNA samples, geno- 
mic DNA from haploid organisms, or 
genomic DNA sequences within the X 
chromosome of human males. 

We have learned that a similar ap- 
proach was independently developed to 
detect single base substitutions in mes- 
senger RNA (34), In that case 32 P- 
labeled antisense SP6-RNA was an- 
nealed to total cellular RNA to generate 
an RNA:RNA duplex containing a single 
base mismatch. As in the case of the 
RNA:DNA mismatches analyzed here, 



Table 1 . Tabulation of the results of an analysis of single base substitutions in the mouse fj- 
major globin promoter region and in the human p-globin gene. 
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The type of mismatch formed in each case. tThe fraction of the total protected Rl^ pro^th^tt 
present in cleaved RNA fragments. tThe mouse promoter mutants are indicated by M fouowea t>y a 
number designating their position relative to the cap site of &-giobin transcription (52). The human JB- 
thalassemia mutations are indicated by H followed by name of the mutation. 5 The probe used in eacn 
case is designated either as sense (S) or antisense (AS). HThe nucleotides surrounding each mismatcft m 
the RNA strand are indicated in a 5' to 3' direction. The underlined nucleotide in each case occurs at tne 
position of the mismatch. 
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RNA:RNA mismatches are also cleaved 
by RNase A. 

Applications. The RNase cleavage pro- 
cedure described provides a sensitive, 
rapid, and simple means of detecting 
single base substitutions in cloned or 
genomic DNA. The 32 P-labeled RNA 
probes are easily prepared with well- 
characterized SP6-plasmid vectors, the 
required enzymes are commercially 
available, and the electrophoresis in- 
volves the use of standard DNA se- 
quencing gels. In addition, analysis of 
the sizes of the RNase cleavage products 
of the RNA:DNA heteroduplexes not 
only provides evidence for the presence 
of a single base mismatch in the test 
DNA but also makes it possible to local- 
ize the mismatch to within a few nucleo- 
tides. 

The RNase cleavage procedure should 
be applicable to problems where the de- 
tection and localization of single base 
substitutions is important. For example, 
the procedure can be applied to the anal-, 
ysis of human genetic diseases. By es- 
tablishing sets of SP6-plasmids contain- 
ing DNA fragments that span an entire 
gene, it should be possible to survey 
rapidly even the largest genes for single 
base mutations. Similarly, this method 
should be valuable for detecting neutral 
polymorphisms in genetic linkage stud- 
ies. The ability to detect a large fraction 
of all possible single base substitutions in 
a DNA fragment with a single RNA 
probe represents a significant advance 
over current methods that involve the 
detection of restriction fragment length 
polymorphisms. Another application of 
this procedure is the localization of mu- 
tations that are genetically selected. 
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ABSTRACT 

We have improved the "polymerase chain reaction" (PCR) to permit 
rapid analysis of any known mutation in genomic DNA, We 
demonstrate a system, ARMS (Amplification Refractory Mutation 
System) , that allows genotyping solely by inspection of reaction 
mixtures after agarose gel electrophoresis. The system is 
simple, reliable and non-isotopic. It will clearly distinguish 
heterozygotes at a locus from homozygotes for either allele. The 
system requires neither restriction enzyme digestion, allele- 
specific oligonucleotides as conventionally applied, nor the 
sequence analysis of PCR products. The basis of the invention is 
that unexpectedly, oligonucleotides with a mismatched 3 '-residue 
will not function as primers in the PCR under appropriate 
conditions. We have analysed DNA from patients with al- 
antitrypsin (AAT) deficiency, from carriers of the disease and 
from normal individuals. Our findings are in complete agreement 
with allele assignments derived by direct sequencing of PCR 
products. 

INTRODUCTION 

The analysis of nucleic acid sequence is central to biology. 
Determination of variation in DNA sequence between individuals 
underpins molecular genetics. Such analysis is routinely 
performed by examination of restriction fragment length 
polymorphism (RFLP) using the Southern blotting technique 
(1,2,3). This approach has proved enormously useful, generating a 
massive literature, despite the fact that it is relatively slow 
and only allows for the examination of the limited number of 
polymorphic base changes which either create or destroy a 
restriction endonuclease recognition site. Without doubt any 
method which enabled all polymorphic base changes in the genome 
to be examined in a facile manner would be invaluable to the 
molecular genetics community. 
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PCR (4) has greatly facilitated the analysis of genomic DNA. it 

allows diagnosis of genetic diseases when combined with one of ~ a 

a variety of other techniques (5,6,7,8,9,10,11,12). We and others se 

have reported the use of PCR and direct sequencing for diagnosis f C 

of inherited diseases (5,6,7,13). Allele-specif ic oligo- (i 
nucleotides (ASOs), (14,15) either radio-labelled (8) or non- ' a r 

isotopically tagged (9) have been applied to disease diagnosis in a p 
the conventional manner by probing dot blots of PCR products. ? ha 

Occasionally a point mutation giving rise to a specific phenotype ge 

may create or destroy a restriction enzyme recognition site (2). s i 

In such instances PCR products may (or may not) be cleaved when Th 

treated with the restriction enzyme. The presence or absence of of 

the restriction site can be used to perform diagnoses as recently l) 

demonstrated for sickle cell anaemia (10). Similarly a an- 

polymorphic restriction site may be in linkage with an ag 

uncharacterised mutation allowing diagnoses to be performed in pr 
informative families by analysis of the amplified restriction 

site polymorphism (11,16,17). y^r 

We demonstrate here a general technique which allows the scrutiny on; 

of any point mutation polymorphism. The technique requires that Ger 

the terminal 3 ' -nucleotide only of a PCR primer be allele pre 

specific. Thus the primer is synthesised in two forms. The oli 

'normal' form is refractory to PCR on 'mutant 1 template DNA and sec 

the •mutant' form is refractory to PCR on 'normal' DNA. In some The 

instances a single 3 ' -mismatched base does allow amplification pre 

to proceed. We have shown that introducing additional deliberate CTC 
mismatches near the 3' end of appropriate primers ameliorates * CGI 
this problem. 

Molecular characterisation of the genes associated with the more wel 

common inherited disorders is constantly providing new fra 

information about the underlying, disease-associated mutations. apo 

Indeed recent sequencing of mutant 6-globin genes has only rarely wer 

resulted in the discovery of novel alleles (18). This implies for 

that at the B-globin locus characterisation of the molecular (5) 

pathology is nearing completion (13). Diseases such as cystic All 

fibrosis, as yet uncharacterised at the gene level, may have Mut 
several RFLPs in linkage disequilibrium with the affected , by 

phenotype (19). Such RFLPs are useful for haplotype analysis and tar 
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risk assessment of carrier status particularly where there is a 
family history of the disease (20). Furthermore, flanking 
sequences of some such RFLPs have been determined allowing PCR 
followed by restriction analysis for haplotype identification 
(16/17).* Some concern has been expressed as to the reliability 
and reproducibility of such assays in the absence of rigorous and 
appropriate internal controls. In theory ARMS would allow rapid 
haplotype analysis in such situations, providing sufficient 
genomic sequence is known around the polymorphic restriction 
site. 

The feasibility of our ARMS was demonstrated by the amplification 
of exon III and part of intron III in the human AAT gene (figure 
1). Direct application of ARMS to the clinically significant S 
and Z alleles of AAT (21) was performed and the diagnoses were in 
agreement with the results of sequence analysis of the PCR 
products ( 5 ) . 

MATERIALS AND METHODS 
DNA preparation 

Genomic DNA was isolated from peripheral blood cells as described 
previously (5). 

Oligonuc leotide amplification, amplification refractory and - 
sequencing primers 

The common primers 1,2,5 and 6 (figure 1) were those described 
previously (5). Their respective sequences were d( CCCACCTTCCCCT 
CTCTCCAGGCAAATGGG) , d ( GGGCCTCAGTCCCAACATGGCTAAGAGGTG ) , d(TGTCCA 
CGTGAGCCTTGCTCGAGGCCTGGG ) and d ( GAGACTTGGTATTTTGTTCAATCATTAAG ) . 
Primer 2a and the 3,4,7 and 8 series of primers (figure 2) as 
well as the primers for the internal control, a 510 base pair 
fragment from the unusually long exon 26 of the human 
apolipoprotein B gene (22) were prepared as described (5) and 
were used without further purification. The sequencing primers 
for initial allele characterisation were those described earlier 
(5). 

Allele c haracterisation by PCR and direct sequencing 
Mutant and normal alleles of the AAT S and Z loci were confirmed 
by PCR amplification either as described (5), or as follows; 
target sequences were amplified in a lOOul reaction volume 
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Figure 1 

The human alpha-l-antitrypsin gene. Coordinates are as described 
by Long et al. (24). Position 1 is the proposed transcription 
start site. The solid boxes represent the five exons, the S and Z 
loci are shown, as are the respective mutations responsible for 
the S and Z phenotypes. The arrows below and above the gene 
represent the various primers used. Primers prefixed by S are 
those used in direct sequencing of PCR products to confirm 
genotypes prior to ARMS analyses. The remaining primers are those 
used to demonstrate the feasibility of the ARMS concept and those 
used in ARMS analyses, these primers are shown in detail in 
figure 2. 
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containing approximately lug genomic DNA, deoxyadenosine 
triphosphate ( dATP ) , deoxycytidine triphosphate (dCTP), 
deoxyguanosine triphosphate (dGTP) and thymidine triphosphate 
(TTP) , each 1.5mM, 67mM Tris-HCl (pH8.8), 16.6mM ammonium 
sulphate, 6.7 mM magnesium chloride, lOmM 2-mercaptoethanol, 
6.7uM EDTA and luM each appropriate amplification primer. Samples 
were heated at 100 °C for 5 minutes to denature the DNA. Two units 
of Thermus aquaticus (Taq) DNA polymerase (23) (Anglian 
Biotechnology) were added to each sample. Samples were overlaid 
with light mineral oil (Sigma, 50uD then heated at 60 °C for 4 
minutes for the first round of DNA synthesis. (See Discussion). 
Subsequent cycles consisted of a two minute denaturation step at 
92 °C and a combined primer annealing and DNA synthesis step at 
60 °C for 4 minutes. 33 cycles were performed and the DNA 
synthesis step of the final cycle was extended to 20 minutes. 
Direct sequencing of the PCR products was as described previously 
(5). 
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ziRMS analysis of genomic DNA 

The feasibility of the ARMS concept was demonstrated using 
duplicate samples of genomic DNA from one normal individual. AAT 
exon III primers 1 and 2 were used with one sample; primers 1 and 
2a (3'C/T mismatch) were used with the other sample. AAT exon V 
primers 5 and 6 were present in both samples serving as an 
internal control. All primers are shown in figure 1. PCR 
reactions were performed and examined by agarose gel 
electrophoresis (3% Nu-sieve) as previously described (5). 
In applying ARMS to subsequent mutation analysis, primers 
'Control 1 1 d ( CTCTGGGAGCACAGTACGAAAAACCACTT) and 'Control 2' d(AA 
TGAATTTATCAGCCAAAACTTTTACAGG) were included in all reactions and 
served to provide an internal control PCR product. The control 
primers amplify a 510 base pair product within exon 26 of the 
human apolipoprotein B gene (22). 

Genomic DNAs of characterised AAT genotypes MM,MS r MZ and ZZ were 
subjected to PCR so as to amplify the internal control fragment. 
In separate pairs of reactions each DNA was either coamplified 
with the appropriate 'normal 1 or 'mutant 1 primer paired with a 
common primer for the respective AAT locus. These primers are 
shown in figure 2 . 

The reactions for the ARMS analyses were performed in a volume of 
lOOul containing approximately lug genomic DNA. dATP, dCTP, dGTP 
and TTP were each 1.5mM in 67mM Tris-HCl (pH8.8), 16.6mM ammonium 
sulphate, 6.7mM magnesium chloride, lOmM 2-mercaptoethanol , 6.7uM 
EDTA and luM each appropriate amplification primer. Samples were 
heated at 100°C for .5 minutes to denature the DNA. Two units Taq 
DNA polymerase (Anglian Biotechnology) was added to each sample. 
Samples were overlaid with light mineral oil (Sigma, 50uD then 
heated at 60°C for 4 minutes for the first round of DNA 
synthesis. Subsequent rounds of amplification comprised two 
minutes denaturation at 92 °C followed by combined primer 
annealing and DNA synthesis at 60°C for four minutes. 33 cycles 
were performed in this way with the final synthesis step extended 
to 20 minutes. 18ul from each reaction was combined with 2ul of 
50% glycerol 0.2% bromophenol blue in IX TBE then electrophoresed 
on 1.4% agarose gels in IX TBE containing 0.5ug/ml ethidium 
bromide . 
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5' ATTCCCCAACCTGAGGGTGACCAAGAAGCT 3. 
3' TAAGGGGTTGGACTCCCACTCGTTCTTCGACGGGTGTGGAG^ 51 
I Primer 2 GTGGAGAATCGGTACAACCCTGACTCCGGG 5' | 

7735 Primer 2a TTGGAGAATCGGTACAACCCTGACTCCGGG 5' 7811 



5' GCCTGATGAGGGGAAACTACAGCACCTGGT Primer 3a 
5' GCCTGATGAGGGGAAACTACAGCACCTGGA Primer 3 
5' CTTCCTGCCTGATGAGGGGAAACTACAGCACCIWaAAAT^ 31 
3' GAAGGACGGACTACTCCCCTTTGATGTCGTGGACC t TTTACTTGAGTGGGTGCTATAGTAGTGGTTCAAGGACCTTO 5' 
I Primer 4 TTTTACTTGAGTGGGTGCTATAGTAGTGGT 5' | 

7642 Primer 4a ATTTACTTGAGTGGGTGCTATAGTAGTGGT 5' 7718 



5' CCGTGCATAAGGCTGTGCTGACCCTCGACA Primer 7g 
5' CCGTGCATAAGGCTGTGCTGACCCTCGACG Primer 7f 
5' CCGTGCATAAGGCTGTGCTGACCATAGACA Primer 7e 
5' CCGTGCATAAGGCTGTGCTGACCATAGACG Primer 7d 
5* CCGTGCATAAGGCTGTGCTGACCATCGCCA Primer 7c 
5' CCGTGCATAAGGCTGTGCTGACCATCGCCG Primer 7b 
5' CCGTGCATAAGGCTGTGCTGACCATCGACA Primer 7a 
5' CCGTGCATAAGGCTGTGCTGACCATCGACG primer 7 
5 ' TCCAGGCCGTGCATAAGGCTCTGCTGACCATCGACgAGAAAGGGACrrc 3* 
3' AGGTCCGGCACGTATTCCGACACGACTGGTAGCTGcTCTTTCCC^ 5' 
I Primer 8 CTCTTTCCCTGACTTCGACGACCCCGGTAC 5' | 

9954 Primer 8a TTCTTTCCCTGACTTCGACGACCCCGGTAC 5' 10030 

Primer 8b CTCATTCCCTGACTTCGACGACCCCGGTAC 5' 
Primer 8c TTCATTCCCTGACTTCGACGACCCCGGTAC 5* 



Figure 2 

ARMS primers. The top panel shows the primers used to test the 
ARMS concept. Primer 2 is complementary to the coding strand of 
the AAT gene. Primer 2a shows the 3 '-OH mismatched T residue. 
Primers 2 and 2a are used in conjunction with primer 1 (figure 
1). The centre panel shows the ARMS primers employed at the AAT S 
locus. The lower case A/T base pair is the AAT S locus and the 
depicted sequence is the normal sequence. The AAT S variant DNA 
has a T/A base pair at this position. Primers 3 and 4 correspond 
to 'normal 1 sequence, primers 3a and 4a correspond to 'mutant 1 
sequence. The lower panel shows the the ARMS primers employed at 
the AAT Z locus. The lower case G/C base pair is the 2 locus and 
the normal sequence is shown. The AAT Z variant DNA has an A/T 
base pair at this position. Primers 7 and 8 correspond to the 
'normal 1 sequence, primers 7a and 8a correspond to the 'mutant' 
sequence. Primers 7, 8, 7a and 8a have not been destabilised. The 
remaining primers in the 7 and 8 series are destabilised and the 
deliberately introduced mismatches are underlined. Primers 7b, 
7d, 7f and 8b correspond to 'normal' sequence (discounting the 
deliberate mismatches) likewise primers 7c, 7e, 7g and 8c 
correspond to 'mutant' sequence, again discounting the introduced 
mismatches. The position numbers are as described by Long et al. 
(24). 
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RESULTS 

frPMS primers 

Figure 1 shows the ARMS primers in relation to the human AAT 
gene. Figure 2 shows each ARMS primer sequence in detail with 
respect to the gene. The position numbers are measured from the 
proposed transcription start site of the AAT gene (24). 
Discounting the variable 3 1 nucleotides and deliberately 
introduced mismatches, the 2,3,7 and 8 series primers are 59% GC 
30mers and the 4 series primers are 41% GC 30mers. The common 
primers 1,2,5 and 6 are a 63% GC 30 mer, 60% GC 30mer, 67% GC 
30mer and a 31% GC 29mer respectively. 
Feasibility of the ARMS concept 

We have shown previously that the AAT gene regions bounded by 
primers 1 and 2 and by primers 5 and 6 (figures 1 and 2) can be- 
coamplified without affecting the efficiency of amplification of 
either target performed in isolation (5). We chose to introduce a 
3* terminal base change into primer 2. Specifically the 3' dG 
residue was replaced by T to provide primer 2a. This substitution 
generates a template/primer C/T mismatch. When primers 1,2,5 and 
6 are combined in a PGR, both the 360 bp product bounded by 
primers 1 and 2, and the 220 bp product bounded by primers 5 and 
6 are observed (figure 3, lane 1). Substitution of primer 2 by 
primer 2a however blocks amplification of the 360 bp product 
while the internal control 220 bp product is still generated 
(figure 3, lane 2). This result is attributable to the lack of a 
3' exonucleolytic proofreading activity of Taq DNA polymerase 
(23) in agreement with the observations of Tindall and Kunkel 
(25). 

ARMS analysis of the AAT S locus 

Genomic DNAs , either homozygous normal with respect to the AAT 
gene S allele or heterozygous S were each amplified as described 
in Materials and Methods. Each DNA was separately amplified using 
primers 2 and 3 and primers 2 and 3a. Primer 3 corresponds to the 
normal sequence at the S locus and primer 3a corresponds to the S 
variant sequence. In all reactions the internal control primers 
were also included. 

On the normal DNA substrate, product was derived only from the 
internal control primers and primers 2 and 3. No product was 
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Figure 3 

Agarose gels showing the feasibility of the ARMS concept (lanes 1 
and 2) and ARMS analyses at the AAT S locus (lanes 3 to 10) and 
AAT Z locus (lanes 11 to 46) Specific reactions are as described 
in the text (results section). 

observed when primer 3a replaced primer 3. When heterozygous S 
DNA replaced homozygous normal DNA the expected 152 bp product 
was generated when either primer 3 or 3a was included in the 
reaction (figure 3, lanes 3 to 6). Primer 3 generates an A/A 
mismatch with S variant DNA and primer 3a generates a T/T 
mismatch with normal DNA. when the ARMS detection primers were 
designed for the opposing strand at the S locus (primers 4, 
normal and 4a, S variant) and used for amplification with a 
common primer 1, similar results were obtained. The 510 bp 
internal control was generated but the 267 bp product was 
observed only when the normal primer was applied to normal DNA or 
the normal or S variant primer was applied to heterozygous S DNA. 
The 267 bp product was not generated when the S variant primer 
was applied to normal DNA (figure 3, lanes 7 to 10). Primer 4 
generates a T/T mismatch with S variant DNA and primer 4a 
generates an A/ A mismatch with normal DNA. 
ARMS analysis of the AAT Z locus 

In analogous experiments to the ARMS analyses of the AAT S locus 
we amplified genomic DNAs characterised as normal, heterozygous 
and homozygous at the AAT Z locus. All reactions contained the 
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internal control primers 1 and 2. Initial experiments contained 
primer 7 (normal) or 7a (mutant) for amplification with common 
primer 6 to yield a 150 bp product (figures 1 and 2). Alternative 
experiments targetting the opposing strand and employing primers 
8 (normal) or 8a (mutant) for amplification with common primer 5 
(figures 1 and 2) would give products of 129 bp. Figure 3, lanes 
11-16 and lanes 35-40 shows the products generated by the 
respective use of primer 7 with normal DNA, primer 7a with normal 
DNA, primer 7 with heterozygous DNA, primer 7a with heterozygous 
DNA, primer 7 with homozygous 2 (ZZ) DNA, primer 7a with ZZ DNA, 
primer 8 with normal DNA, primer 8a with normal DNA, primer 8 
with heterozygous DNA, primer 8a with heterozygous DNA, primer 8 
with ZZ DNA and primer 8a with ZZ DNA. In contrast to the ARMS 
data for the AAT S locus, the corresponding results for the AAT Z 
locus show reduced specificity in that products were evident 
using either normal primer with ZZ DNA and either mutant primer 
with normal DNA. Primer 7 with ZZ DNA generates a primer /template 
G/T mismatch. Conversely primer 7a with normal DNA generates an 
A/C mismatch. Primer 8 with ZZ DNA generates a C/A mismatch and 
primer 8a with normal DNA generates a T/G mismatch. 
In an attempt to increase the specificity of the ARMS primers we 
chose to deliberately introduce an additional mismatch near their 
3' -ends. When primers 7f and 7g which have a deliberate C/T 
mismatch seven bases from the 3 r -end were introduced to replace 
primers 7 arid 7a (figure 2) specif icity was improved. Figure 3, 
lanes 17-22 shows the products of these reactions. In particular, 
lane 18 shows the virtual absence of the 150 bp product when the 
mutant primer (7g) is applied to normal DNA. Unfortunately the 
normal primer (7f ) still generates a small amount of product with 
ZZ DNA (figure 3, lane 21), but much reduced with respect to the 
yield with the mutant primer on ZZ DNA with equivalent internal 
control products. When primers 7d and 7e which have a deliberate 
A/G mismatch five bases from their 3 • -ends were introduced to 
replace primers 7 and 7a similar results were obtained (figure 3 
lanes 23 to 28) to those with primers 7f and 7g. When primers 7b 
and 7c replaced primers 7 and 7a in the system the desired 
specificity was observed. Primers 7b and 7c have a deliberate C/T 
mismatch three bases from their 3' -ends. Specifically only primer 
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7b generated a 150 bp product with normal DNA (fig. 3 lane 29). £ 
The 'mutant' primer 7c failed to do so (fig. 3 lane 30). Both r 
primers 7b and 7c generated product from heterozygous DNA (fig. 3 n 
lanes 31 and 32). Primer 7b failed to generate the 150 bp product 
with ZZ DNA whereas the 'mutant' primer 7c did generate the 150 
bp product (fig. 3 lanes 33 and 34). Similar exchange of primers 
8 and 8a for primers 8b and 8c (which have an A/A mismatch four 
bases from their 3 '-ends) showed marginal increased specificity ' 
(fig. 3 lanes 41 to 46). 
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DISCUSSION 

Interest is increasingly being focused on the mutations in the 
human genome which produce disease states. The number of such 
mutations characterised at the DNA sequence level is increasing 
rapidly and this has been substantially aided by the PCR/direct 

sequencing approach for the analysis of genomic DNA. We dc 

previously reported that PCR followed by direct sequencing was ar 
absolutely specific for the diagnosis of AAT deficiency (5). In 
this communication we present ARMS, a system allowing the direct 

analysis of any locus of interest and thus generally applicable C, 

to any inherited disease provided sufficient sequence data is op 

available. ARMS is simple, rapid and reliable providing the . mi 

capability for accurate pre- and postnatal diagnosis and a means th 

for heterozygote detection. ARMS is still of benefit even if mc 

disease-associated mutations, as yet uncharacterised, are linked de 

to characterised polymorphisms, in such instances the technique It 

will allow detailed haplotype analyses to be performed with a de 

minimal quantity of DNA. Accurate prenatal diagnoses are en 

achievable in a few hours if maternal contamination of the foetal si 

material is avoided. An important practical consideration with te 

this approach (as with other PCR-based strategies) is that it is re 

unecessary to prepare high quality DNA suitable for restriction be 
enzyme digestion. . wn 

A prerequisite of ARMS is the absence of a 3 ' -exonucleolytic We 

proofreading activity associated with the DNA polymerase co 

employed. The lack of such an exonuclease associated with Taq DNA ba 
polymerase has been confirmed here by the successful application ', fa 

of ARMS and independently (23,25). Another requirement in the at 
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application of ARMS is that 3 1 -OH terminal mismatched primers are 
refractory to extension by the chosen DNA polymerase. This was 
not apparent from the work of Tindall and Kunkel (25), since 
their exonuclease assay required, and did generate, polymerase 
products from C/A mismatched primer /template complexes. Taq 
polymerase refractory mismatches have been demonstrated in this 
work for some mismatched primers. In instances where the mismatch 
is not refractory to extension (as demonstrated with 
primer/ template G/T, A/C, C/A and T/G mismatches at the AAT 2 
locus) further deliberate mismatches to destabilise the 
primer/template complexes render the primers increasingly 
refractory as the additional mismatch is moved progressively 
closer to the 3' end. 

Empirically, the degree of specificity observed with mismatched 
primers (and thus the requirement for additional 
destabilisation) , correlates with the mismatch type. C/T, A/A 
and T/T mismatches (which are all either purine/purine or 
pyrimidine/pyrimidine mismatches) are considerably more 
refractory to extension by Taq polymerase than G/T, T/G, A/C or 
C/A mispairs (all purine/pyrimidine mismatches). We have not yet 
optimised the position for introduction of the deliberate 
mismatches, nor the type of mismatch, neither have we examined 
the effect of deliberate base-pair insertions, deletions or 
modifications which may also be expected to appropriately 
destabilise non-refractory primers. 

It is likely that any of these approaches to deliberately 
destabilise the ARMS primers and hence improve specificity may be 
enhanced by reducing dNTP, magnesium and primer concentrations or 
simply increasing the amplification annealing/extension 
temperature. Conversely an increase in concentration of these 
reagents or a decreased amplification annealing temperature might 
be expected to have an adverse effect on specificities of primers 
which previously generated no unwanted product. 

We have deliberately chosen relatively unforgiving amplification 
conditions in this series of experiments so as to challenge the 
basic concept fully. Removal of tubes from the heating block to 
facilitate the addition of enzyme after initial DNA denaturation 
at 100°C was performed in the experiments described herein. 



2513 



Nucleic Acids Research 



Undoubtedly this allows cooling of reaction mixtures and is 
difficult to control precisely. Products were generated at the 
AAT Z locus with mismatched, non-destabilised primers and primers 
with additional mismatches 7, 5 and 4 bases from their 3'- 
termini. It is conceivable that the generation of these products 
will be avoided, if, after heat denaturation of the genomic DNA 
in the presence of primers the reactions are not allowed to 
briefly cool during enzyme addition. These products may result 
from a proportion of template molecules being primed at a less 
stringent lower temperature than the routine extension 
temperature. Any extension products so derived would then be 
correctly paired with the ARMS primer in subsequent cycles, and 
so generate the observed unwanted products. 

Destabilisation of the ARMS primers where necessary such that 
anomalous products are not generated has been one of our 
approaches to the development of this technique at this early 
semi-manual stage. Obviously, further refinement is possible by 
optimising such variables as magnesium, dNTP or Taq polymerase 
concentrations and the precise temperature throughout the ARMS 
cycles. Careful control of the later variable in particular 
should be achievable with fully automated ARMS instrumentation. 
Indeed, addition of Taq polymerase to ARMS reaction mixtures at 
60 °C and ensuring that the reaction temperature never falls below 
this, may significantly increase reaction specificity and avoid 
generation of products on mismatched templates. 

The AAT Z mutation is caused by a G to A transition immediately 
preceded by a C residue. The AAT S phenotype results from an A to 
T mutation. Analysis of single base mutations within coding 
regions causing human genetic disease shows that 35% of such 
mutations have occured within CpG dinucleotides and that over 90% 
of these were either C to T or G to A transitions (26). Since 
such mutations would generate the same primer /template mismatches 
as at the AAT Z locus, it is expected that destabilisation of 
ARMS primers will be required for at least 30% of ARMS potential 
applications. 

We have chosen to use large (30mer) primers in this assay since 
this allows the use of high annealing temperatures to improve 
specificity and reduces the chance of mispriming elsewhere on 
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genomic DNA. An alternative approach might be to use shorter 
primers which span a point mutation such that discrimination 
between 'normal' and 'mutant' loci is achieved by hybridisation 
of the primers in an allele-specif ic manner under appropriately 
stringent conditions. This type of analysis, also of the AAT Z 
locus, has been performed by Dermer and Johnson (27). It is 
important to note however that in this analysis the specificity 
of the primers was not absolute, the absence of internal controls 
could conceivably give rise to incorrect diagnoses and 
hybridisation to blots of the reaction mixtures was required. 
Other disadvantages would be that different conditions would have 
to be determined for each locus of interest which would 
complicate the simultaneous examination of multiple loci, 
would also be the danger of the primers priming at loci 
than those desired. 

It has not escaped our notice that ARMS may have many 
applications in medicine and molecular biology. The technique 
will be useful for the precise typing of infectious pathogens 
where characteristic strain-specific base changes can be 
identified. The analysis of oncogene activation is rendered 
straightforward as is the detection of deletions in DNA. Many 
further applications can also be envisaged in the research 
context . 



There 
other 

other 



ACKNOWLEDGEMENTS 

We thank Dr D. Holland for the preparation of the 
oligonucleotides used throughout and all the physicians who 
provided samples for testing. 

Patent applications relating to the methods described here are 
pending . 



REFERENCES 



1. 
2. 
3. 
4. 



5. 



Southern, E.M. (1975) J 
Kan, Y.W. and Dozy, A.M 



Mol. Biol. 98 503-517. 
(1978) Lancet ii 910-912. 



Solomon, E. and Bodmer, W.F. (1979) Lancet i 923. 



Saiki, R.K., Gelfand, 
Higuchi, R. , Horn, G.T 



D.H., Stoffel, S., Scharf, S.J., 
Mullis, K.B. and Erlich, H.A. 
(1988) Science 239 487-491. 
Newton, C.R., Kalsheker, N., Graham, A., Powell, S. , 
Gammack, A., Riley, J. and Markham, A.F. (1988) Nucl. Acid. 
Res. 16 8233-8243. 



2515 



Nucleic Acids Research 



6. Stoflet, E.S., Koeberl, D.D., Sarkar, G. and Sommer, S.S. 
(1988) Science 239 491-494. 

7. Engelke, D.R. , Hoener, P. A. and Collins, F.S. (1988) Proc. 
Natl. Acad. Sci. USA. 85 544-548. 

8. Bruun Petersen, K. , Kcplvraa, S., Bolund, L., Bruun Petersen, 

G. , Koch, J. and Gregersen, N. (1988) Nucl. Acid. Res. 
16 352. 

9. Bugawan, T.L., Saiki, R.K. , Levenson, C.H., Watson, R.M. and 
Erlich, H.A. (1988) Biotechnology 6 943-947. 

10. Chehab, F.F., Doherty, M. , Cai, S. , Kan, Y.W. , Cooper, S. 
and Rubin, E.M. (1987) Nature 329 293-294. 

11. Kogan, S.C., Doherty, M. and Gitschier, J. (1987) New Engl. 
J. Med 317 985-990. 

12. Levinson, B., Janco, R. , Phillips, J. and Gitschier. J. 

(1987) Nucl. Acid. Res. 15 9797-9805. 

13. Wong, C, Dowling, C.E., Saiki, R.K. , Higuchi, R.G. , Erlich, 

H. A. and Kazazian, H.H. (1987) Nature 330 384-386. 

14. Connor, B.J., Reyes, A. A., Morin, C. Itakura, K., Teplitz, 
R.L. and Wallace, R.B. "(1983) Proc. Natl. Acad. Sci. USA. 80 
278-282. 

15. Orkin, S.H., Markham, A.F. and Kazazian, H.H. (1983) J. 
Clin. Invest. 71 775-779. 

16. Feldman, G. , Williamson, R. , Beaudet, A.L. and O'Brien, W.E. 

(1988) Lancet ii 102. 

17. Williams, C, Williamson, R. , Coutelle, C, Loeffler, F., 
Smith, J. and Ivinson, A. (1988) .Lancet ii 102-103. 

18. Wong, C. , Antonarakis, S.E., Goff, S.C., Orkin, S.H., Boehm, 
CD. and Kazazian, H.H. (1986) Proc. Natl. Acad. Sci. USA 83 
6529-6532. 

19. Estivill, X., Farrall, M. , Scambler, P.J., Bell, G.M., 
Hawley, K.M.F., Lench, N.J., Bates, G.P., Kruyer, H.C., 
Frederick, P. A., Stanier, P., Watson, E.K., Williamson, R. 
and Wainwright, B.J. (1987) Nature 326 840-845. 

20. Farrall, M. , Estivill, X. and Williamson, R. (1987) Lancet 
ii 155-157. 

21. Nukiwa, T., Brantly, M. , Garver, R. , Paul, L. , Courtney, M. , 
LeCocq, J. P. and Crystal, R.G. (1986) J. Clin. Invest. 77 
528-537. 

22. Ludwig, E.H., Blackhart, B.D., Pierotti, V.R. , Caiati, L. , 
Fortier, C. , Knott, T., Scott, J., Mahley, R.W. , 
Levy-Wilson, B. and McCarthy, B.J. (1987) DNA 6 363-372. 

23. Chien, A., Edgar, D.B. and Trela, J.M. (1976) J. Bacteriol. 
127 1550-1557. 

24. Long, G.L., Chandra, T. , Woo, S.L.C., Davie, E.W. and 
Kurachi, K. (1984) Biochemistry 23 4828-4837. 

25. Tindall, K.R. and Kunkel, T.A. (1988) Biochemistry 27 6008- 
6013. 

26. Cooper, D.N. and Youssoufian, H. (1988) Hum. Genet. 78 151- 
155. 

27. Dermer, S.J. and Johnson, E.M. (1988) Lab. Invest. 59 403- 
408. 



i j 994 Oxford University Press 



Nucleic Acids Research, 1994, Vol 22, No. 20 4167-4175 



penetic Bit Analysis: a solid phase method for typing 
|single nucleotide polymorphisms 

frheo T.Nikiforov, Robert B.Rendle, Philip Goelet, Yu-Hui Rogers, Michael L.Kotewicz, 
Istephen Anderson 1 , George LTrainor 2 and Michael R.Knapp* 

iMolecular Tool, Inc., Alpha Center, Hopkins Bayview Research Campus, 5210 Eastern Avenue, 
fBaltimore, MD 21224, 1 Center for Advanced Biotechnology and Medicine, 679 Hoes Lane, 
" ? Piscataway, NJ 08854 and 2 Chemical and Physical Sciences, Research and Development, Du Pont 
l ; Merck Pharmaceutical Company, PO Box 80353, Wilmington, DE 19880, USA 



'Received June 29, 1994; Revised and Accepted September 7, 1994 



m 
m 



.in 



ABSTRACT 

A new method for typing single nucleotide poly- 
morphisms in DNA is described. In this method, 
specific fragments of genomic DNA containing the 
polymorphic site(s) are first amplified by the poly- 
merase chain reaction (PCR) using one regular and 
one phosphorothioate-modified primer. The double- 
stranded PCR product is rendered single-stranded by 
treatment with the enzyme T7 gene 6 exonuclease, and 
captured onto individual wells of a 96 well polystyrene 
plate by hybridization to an immobilized oligonucleotide 
primer. This primer is designed to hybridize to the 
single-stranded target DNA immediately adjacent from 
the polymorphic site of interest. Using the Klenow 
fragment of E.coli DNA polymerase I or the modified 
T7 DNA polymerase (Sequenase), the 3' end of the 
capture oligonucleotide is extended by one base using 
a mixture of one biotin-labeled, one f luorescein-tabeled, 
and two unlabeled dideoxynucleoside triphosphates. 
Antibody conjugates of alkaline phosphatase and 
horseradish peroxidase are then used to determine the 
nature of the extended base in an ELISA format. This 
paper describes biochemical features of this method 
In detail. A semi-automated version of the method, 
which we call Genetic Bit Analysis (GBA), is being used 
on a large scale for the parentage verification of 
thoroughbred horses using a predetermined set of 26 
diallelic polymorphisms in the equine genome. 

INTRODUCTION 

Mammalian genomes carry numerous single nucleotide 
polymorphisms (SNPs). On average, two to three polymorphic 
sites are found per kilobasepair in human genomic DNA (1). 
Most of these polymorphisms are 'silent* and do not give rise 
to detectable phenorypes, but an important subset of mutations 
are associated with heritable diseases such as cystic fibrosis (2), 



sickle cell anemia (3), colorectal cancer (4), and retinitis 
pigmentosa (5, 6). 

The wealth of genetic information associated with SNPs can 
be exploited in a wide variety of applications ranging from the 
detection of alleles linked to common genetic diseases, to the 
identification of individuals, to the use of genetic polymorphisms 
in gene mapping projects. Each of these applications involves 
the analysis of a large number of samples and will ultimately 
require rapid, inexpensive, and highly automated methods for 
typing DNA sequence variants. 

Because of the importance of SNPs, a number of methods have 
been described for their detection and typing. In general, methods 
that can be used to discover new mutations can be applied to 
the typing of those that are already known. These methods include 
restriction fragment length polymorphism (RFLP) analysis (7), 
denaturing gradient gel electrophoresis (8), single strand 
conformation polymorphism (SSCP) detection (9), and chemical 
or enzymatic mismatch modification assays (10,11). Although 
powerful, these techniques typically rely on electrophoretic 
separation to detect the polymorphisms, are relatively labor- 
intensive, and are difficult to automate. 

Approaches for the large-scale typing of known mutations have 
been described that can be carried out in a nonelectrophoretic 
mode. Some of these approaches rely on differential hybridization 
to discriminate the different alleles (12, 13). Other methods, 
which base the discrimination on an enzymatic reaction, include 
the oligonucleotide ligation assay (OLA) (14,15), the ligase chain 
reaction (16), the allele-specific polymerase chain reaction 
(17,18), and the primer guided nucleotide incorporation assays 
(19-23). Of these, both the oligonucleotide ligation assay and 
the primer guided incorporation techniques have been developed 
to a stage where they can be used for the typing of a relatively 
large number of samples. 

Here, we present a new primer guided genotyping method 
(Figure 1). The sequence information surrounding the site of 
variation in the target DNA is used to design an oligonucleotide 
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primer that is complementary to the region immediately adjacent 
to but not including, the variable nucleotide site in the target 
DNA A single-stranded nucleic acid target molecule is 
hybridized to this primer immobilized on the polystyrene of a 
96-well microplate. The primer is extended by one haptenated 
dideoxynucleoside triphosphate using a DNA polymerase in the 
presence of all four chain terminating dideoxynucleoside 
triphosphates. Novel haptenated ddNTPs allow discrimination of 
the incorporated nucleotide to be accomplished using standard, 
enzyme-linked colorimetry. 

Our method differs significandy from other primer guided 
genotypine methods described in the literature. Most importantly, 
in our method the extension step is carried out in the presence 
of chain terminating ddNTPs only, and therefore only one 
nucleotide can be incorporated at the 3' end of the immobilized 
primer. Second, the immobilization of the primer rather than the 
template permits the removal of the latter from the reaction 
mixture following the extension of the primer and thus elimination 
of all signals that could arise from nonspecific extension at the 
3' end of the template. Thirdly, the technique allows the detection 
of two possible alleles in the same well of a microtiter plate which 
results in both operational and biochemical advantages. 

In this paper, we give a detailed description of this DNA typing 
method, called Genetic Bit Analysis (GBA). The 'genetic bit' 
is the term we have adopted for the most elementary form of 
genetic information, namely a single DNA nucleotide. GBA is 
a highly flexible method that can be applied, under a standard 
set of biochemical conditions, to the typing of any nucleic acid 
polymorphism whose sequence is known. In this paper we focus 
on the biochemical basis of GBA. Our experience would suggest 
that features of specificity and convenience inherent in the GBA 
biochemistry permit the method to become widely used for typing 
single nucleotide polymorphisms (SNPs) in both research and 
clinical laboratory applications. 

EXPERIMENTAL 

Enzymes 

Taq DNA polymerase was obtained from Perkin-Elmer. E.coli 
DNA polymerase, Klenow fragment, and 17 gene 6 exonuclease 



were purified from recombinant E.coli clones containing suitable 
expression plasmids (unpublished). 

Oligonucleotide synthesis 

All oligonucleotides were synthesized using standard phos- 
phoramidite chemistry on an Applied Biosystems 392/394 DNA 
svnthesizer. using reagents obtained from Glen Research 
(Sterling VA). For the synthesis of phosphorothioate primers, ; 
the sulfurizins reagent tetraethylthiuram disulfide (TETD, ; 
Applied Biosystems)* was used as recommended by the manu- ; 
facturer All oligonucleotides were deprotected with concentrated : 
ammonia and desalted using NAP 5 (0.2 M mol scale synthesis) 
or NAP 25 (1 m™ 0 *) S el fto ralion columns (Pharmacia). 
Oligonucleotides biotinylated at the 5' end were prepared using 
a biotin phosphoramidite (DMT-Biotin-C6-PAj, obtained from 
Cambridge Research Biochemicals, Inc. (Wilmington, DE). The 
coupling°time of this phosphoramidite was extended to two 
minutes. The abasic C 3 linker was introduced using the Spacer 
Phosphoramidite C 3 (Glen Research). 

The sequences of the oligonucleotides used m experiments 
described in this paper are given in Table 1. 

Immobilization of oligonucleotides onto 96-well ELISA plates 

Immulon 4 plates (Dynatech Laboratories, Chantilly VA) were 
used for all experiments shown. Fifty fi\ aliquots of a 0.2 
olioonucleotide solution in a freshly prepared 20 mM solution 
of t-ethy l-3-(3-dimethy laminopropy l)carbodumide hydrochloride 
(EDC obtained from Sigma) in water were added to individual 
wells 'of a 96 well plate and incubated overnight at room 
temperature. The plates were then washed with a solution of 
¥Sw (10 mM Tris-HCl. pH 7.5, 150 mM NaCl. 0.05% 
Tween-20). The same procedure was used for other experiments, 
in which EDC was replaced in the immobilization step by NaCl, 
tetramethylammonium chloride, cetyltrimethylammomum 
bromide, and octyldimethylamine hydrochloride. 

We have tested a number of different commercially available 
96 well plates for their suitability for oligonucleotide 
immobilization. In general, plates that are described as having 
a more hydrophilic surface gave good results, whereas those with 
a hydrophobic surface were found unsuitable. Examples ot 




Table 1. Sequences of oligonucleotides used in this paper (B denotes a biotin residue: X is a C 3 linker; phosphorothioate 
bonds are located between the underlined residues) 



number 



sequence 



308 5 ' AGCCTCCGACCGCGTGGTGCCTGGT 

308T 5 ' AGCCTCCTACCGCGTGGTGCCTGGT 

308M1 5 ' AGCCTCCXACCGCGTGGTGCCTGGT 

680 5'GAGATGCAGCTCTAAGTGCTGTGGG 

680T 5 'GAGATTC AGCTCTA AGTGCTGTGGG 

1 1 12 5'AGTATAATAATCACAGTATGTTAGC 

1 676 5 ' BCC ACGGCT AAC ATACTGTGATTATTAT ACTT AG AT 

1464 5 r B AATAAGGGG AAACAATTC AGCCCA 

50 1 5 'GTTATGGGCTGAATTGTTTCCCCTAATTT 

713 5 'TTCTAC ATTCATTTTCTTGTTCTGT 

1302 5'GGAGAACAGAACAAGAAAATGAATATGA- 
ATGTAGAAGCAT 

1473 5 'CCACAAC AGAACA AG AAAATG A ATATGA- 
ATGTAGAAGCAT 

1474 5'AACAGAACAAGAAAATGAATATGAATGT- 

AGAAGCAT 

1214 5 ' ACCTTC AAAACTC AACTC AGCTCTT PCR primer 

1215 5 TTTACC AATG AGAAGG AC ATCTA AG 
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GBA primer 
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: table plates include Immulon 4 (Dynatech); Maxisorp (Nunc); 
i ImmunoWare plates (Pierce). No attachment could be 
neved on Immulon 1 (Dynatech) and Polysorp (Nunc) plates. 

NA isolation and PGR amplification 
orse genomic DNA, isolated from swabs of the nasal mucosa, 
>as the source of DNA in aU PCR amplifications. A foam-tipped 
Svab on a six-inch plastic stick was inserted one to two inches 
into the horse's nostril and immediately immersed in transport 
'buffer (10 mM Tris-HCl, pH 8.0, 10 mM NaCl, 5 mM EDTA, 
;0.5% SDS). The swab remained stored in this solution under 
;^mbient conditions until arrival at the laboratory. DNA was 
-'isolated from this mixture by treatment with a mixture of 
'guanidine hydrochloride and ethanol and adsorption to glass 
^matrices (e.g., Magic™ resin, obtained from Promega, 
^Madison, WI), followed by recovery in 10 mM Tris-HCl, pH 
i;8.0, 0. 1 mM EDTA. PCRs were carried out in a total volume 
? of 30-50 fi\. The final concentration of the PCR primers was 
:,0.5 fiM. Following an initial two minute denaturation step at 
S95°C, thirty-five cycles were carried out, each consisting of 
| denaturation (1 min at 95 °C), annealing (2 min at 60°C), and 
Intension (3 minutes at 72°C). Tag DNA polymerase was used 
g-at a concentration of 0.025 units//il. The final composition of 
;|the PCR buffer was: 1.5 mM MgCl 2 , 50 mM KC1, 10 mM 
^.Tris-HCl, pH 8.3, and 170 /*g/ml BSA. 

I Preparation of single-stranded PCR fragments 
' Single-stranded DNA was prepared from double-stranded PCR 
products as described (25). One of the strands was protected from 
exonuclease hydrolysis by the introduction, during synthesis, of 
four phosphorothioate groups at the 5' end of one of each pair 
of the PCR primers. Following the PCR amplification, T7 gene 
6 exonuclease, diluted in a buffer containing 50 mM Tris-HCl, 
: pH 7.5, 1 mM DTT, and 100 jig/ml BSA, was added to a final 
concentration of 0.5 units/^L Incubation with this enzyme was 
for one hour at room temperature. 

Hybridization of single-stranded PCR fragments to 
oligonucleotides immobilized onto ELISA plates 
After the exonuclease treatment, an equal volume of 3 M NaCl, 
20 mM EDTA was added to the reaction mixture and 20 /ii 
aliquots of the resulting solution transferred to individual wells 
containing the appropriate immobilized oligonucleotide molecule. 
Hybridization was carried out for 30 min at room temperature 
and was followed by washing with TNTw. 

Labeled dideoxy nucleoside triphosphates 

All biotin- and fluorescein-labeled chain-terminating 2\3'-di- 
deoxynucleoside triphosphates used in the single nucleotide 
extension reaction were purchased from Du Pont NEN 
(Wilmington, DE). A selection of labeled ddNTPs are com- 
mercially available from that supplier (sold as Renaissance non- 
radioactive products). These compounds are derivatives of amino- 
propynyi-substiruted 2',3 / -dideoxypyrimidines or 2\3'-dideoxy- 
7-deazapurines. the chemistry of these chain terminators and 
fteir use in DNA sequencing have been described (24). 

Solid-phase primer extension 

Following the hybridization step, 20/xI of polymerase extension 
mix was added to each well. The extension mix contained 20 
mM Tris-HCl, pH 7.5; 10 mM MgCU; 25 mM NaCl; 10 mM 
MnCl 2 ; 15 mM sodium isocitrate; 1.5 ftM of two unlabeled 
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2',3'-dideoxynucleoside triphosphates; 1.5 fiM of one biotin- 
labeled and 1.5 pM of one fluorescein-labeled 2',3'-dideoxy- 
nucleoside triphosphate; and the Klenow fragment of E.coli DNA 
polymerase I (0.3 units per well). The polymerase was diluted 
in a buffer containing 10 mM Tris-HCl, pH 7.5, 5 mM DTT, 
and 0.5 mg/ml BSA. The extension reaction was carried out for 
10 min at room temperature. The plates were subsequently 
washed once with TNTw, once with 0.2 N NaOH, and three 
additional times with TNTw. 

Colorimetric detection of the incorporated nucleotides 
After the extension step, the wells were incubated for 30 min 
at room temperature with 40 of 1 % BSA in TNTw containing 
an alkaline phosphatase conjugate of antifluorescein (Biodesign 
International, Kennebunk, ME) and a horseradish peroxidase- 
conjugated antibiotin (Vector Laboratories, Burlingame, CA). 
The dilution factor was 1 :500 for the antifluorescein and 1: 1000 
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Figure 1. Schematic representation of the individual steps of single nucleotide 
typing by GBA. In Step 1. a DNA fragment containing the polymorphic site to 
be typed is amplified by PCR using one primer containing four phosphorothioate 
bonds at the 5' end. In Step 2, the double-stranded PCR product is rendered single- 
stranded by treatment with T7 gene 6 exonuclease. In Step 3, the singJe-stranded 
DNA template is captured by hybridization to a primer immobilized to the wells 
of a microtiter plate, whereby the polymorphic site of the template is located 
immediately downstream from the 3' end of the primer. In Step 4, the 3' end 
of the primer is enzymatically extended by one nucleotide using' haptenated 
ddNTPs. In Step 5« the nature of the incorporated nucleotide(s) is revealed by 
an enzyme-linked assay. 
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for the antibiotin. These dilutions were calibrated to give 
colorimetric signals of approximately equal intensity with the*two 
alleles. 

After washing, the presence of alkaline phosphatase was 
detected first by addition of 100 /d per well of a 1.5 mg/ml 
solution of p-nitrophenyl phosphate in 100 mM diethanolamine, 
pH 9.5, 20 mM MgCl 2 . The plate was immediately placed in 
a kinetic microplate reader (Molecular Devices Corporation, 
Menlo Park, CA), and the development of color was followed 
at 405 run for 2 min. The results were expressed as 
mOD 405 /min. The plates were then washed again and incubated 
with 100 /il of a 1 mg/ml solution of o-phenyfenediamine in 0. 1 
M citric acid, pH 4.5, containing 0.012% H.O, The reaction 
was followed by measuring the change of Iighrabsorbance at 
450 nm as above. Most experiments in microtiter plates described 
in this article have been carried out at least in triplicate, and the 
results presented are the averaged numbers. The two enzymes 
used, alkaline phosphatase and horseradish peroxidase have 
significantly different pH optima (9.8 vs. 4.5), thus, if endpoint 
readings are to be taken, it is preferable that the incubations with 
the two antibody conjugates are carried out sequentially rather 
than simultaneously in order to avoid partial inactivation of the 
second antibody conjugate. 

Single nucleotide primer extension in solution 
A solution containing 2 pmole of the synthetic template ff 501 
(see Table 1 for oligonucleotide sequences) and 800 fmole of 
the 5' biotinylated primer ff 1464 in 20 mM Tris-HCI pH 7 5 
10 mM MgCl 2 , 25 mM NaCl, 10 mM MnCI 2 and 15 mM 
sodium isocitrate, was heated to 95°C for 10 min, then slowly 
cooled down to room temperature to anneal the primer to the 
template. Aliquots of this solution were then added to four 
individual tubes, each containing solutions of one fluorescein- 
labeled ddNTP, the three other unlabeled ddNTPs, and the 
Klenow polymerase. After 10 min incubation at room 
temperature, aliquots of these mixtures were transferred to 
individual wells of an avidin-coated microtiter plate to capture 
the extension complexes via the 5' biotin residue of the primer 
The wells were then washed with 0.2 N NaOH to remove the 
template strand and the presence of fluorescein was detected usin° 
an anti-fluorescein HRP conjugate. c 

RESULTS AND DISCUSSION 

DNA typing by Genetic Bit Analysis (GBA) 
The individual steps of GBA are shown schematically in Figure 
I . We have developed a test for the parentage verification of 
thoroughbred horses based on GBA, whereby each horse is typed 
at 26 different, diallelic loci. The use of 96 well microtiter plates 
has allowed us to develop a semi-automated version of the test 
taking advantage of a number of commercially available' 
automated liquid handlers for that format. In this automated 
version of the test, 88 horses are typed, together with suitable 
controls, with respect to one locus on one microtiter plate. Figure 
2 shows a typical result from such a test. This Figure represents 
*e results from the typing of 88 horses with rf spect fo locus 
JH26I-1 a single nucleotide polymorphism present in the equine 
genome (manuscripun preparation). The same microtiter plate 
was photographed after development of the colorimetric reaction 
for alkaline phosphate which reveals allele 1 (incorporation of . 
fluoresceins ddCTP, top) and, with appropriate processing 



after the colorimetric reaction for horseradish peroxidase tfr 
reveals allele 2 (incorporation of biotinylated ddUTP, bouomV 
Controls for specific and non-specific effects were also run ($& 
legend to Figure 2). Genotypes are visually scorable- cc 
homozygotes give a strong reaction with alkaline phosphatase 
but are negative for horseradish peroxidase, TT homozygotes 
have the opposite profile, and CT heterozygotes are positive W 
both enzymatic reactions. A 
Absorbance values for a comparable set of 88 horses typed 
with respect to the same polymorphic locus JH261-1 were 
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Figure 2. Colorimetric detection of two alleles on a single microtiter plate. The 
GBA primer # 11 12 (see text for sequence) was immobilized in all wells of this 
plate using EDC A 1 16 bp fragment was amplified from the genomic DNA of 
86 different thoroughbred horses, using the primers ft 1214 and # 1215. This 
PCR fragment contains a CT diallelic polymorphism (JH261). Following the 
hybridization of the single-stranded PCR templates to the GBA primer, an extension 
reaction was carried out using fluorescein-labeled ddCTP and biotin-Iabeled 
ddUTP. The two haptens were detected as described in the text. A. detection 
of ddCTP incorporation using an alkaline phosphatase conjugate; B. detection 
of ddUTP incorporation using a horseradish peroxidase conjugate. The plate 
contained the following controls: a) no template was added to "wells A 12 and 
BI2 (template-independent extension controls): b) a 35 mer svnthetic template 
(250 fmoi) giving incorporation of a ddCTP was added to wells C12 and D12; 
c) a similar synthetic template giving incorporation of a ddUTP was added to 
wells E12 and F12: d) a mixture of both templates was added to wells GI2 and 
H12; e) to control for PCR crosscontamination. negative PCR reactions were 
carried out and added to wells Gil and HI 1. 
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pleasured j n a 96-well spectrophotometer. The graph in Figure 
&3 depicts the results quantitatively as a scaner plot. The values 
ffot each horse by typing for allele 1 (C) are indicated on the 
fx-axis; those for allele 2 (T) on the Y-axis. Previous experiments 
I which typed this locus with respect to several hundred horses 
I -failed to find a third allele (data not shown). The data in Figure 
1 3 are consistent with JH261-1 being a diallelic, single nucleotide 
f polymorphism (SNP). 

t The genotype groups are circled in Figure 3. A summary of 
l the quantitative data is given in Table 2. It can be seen that the 
$ mean values for the two alleles can be calibrated to be roughly 
I equivalent. Furthermore, variability between horses is 
surprisingly small. In the experiment shown, eight test samples 
produced signals with both alleles that were judged unscorable. 
Theoretically, this result could have been produced because of 
a failure in one or more of the biochemical reactions leading to 
the colorimetric data, because of a failure to amplify due to allelic 
variability in the primer sites, because of an allelic variability 
in the GBA primer site, or because the horses in question 
possessed an allele other than A or G in the template strand. We 
have investigated a large number of these results further and in 
all cases thus far examined, biochemical failure is the explanation 
of failure. This has been shown by analysis of the PGR reactions 
by gel electrophoresis to be PGR failure in most cases. The failure 
rate has been found to be substantially lower when a standardized 
control horse genomic DNA is used. For this reason, we believe 
that variability in sample quality produces most failure in our 
system. However, it can be anticipated that situations leading 
to failures of the other three types will arise in complex genomes 
and therefore adequate characterization of variability in target 
genetic loci is required for optimal utilization of the GBA method. 
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Immobilization of the GBA primer to microtiter plates 
We have previously described the details of our method for 
oligonucleotide immobilization onto polystyrene plates (25). 
Briefly, the method consists of incubating the oligonucleotide on 
the microtiter plates in a dilute solution of an organic or inorganic 
salt, followed by washing with a solution containing 0.05% 
Tween 20. We have shown that the oligonucleotides immobilized 
in this way are capable of specific hybridization to complementary 
templates and have used these findings to develop a convenient 
microplate-based PCR product detection assay. Other authors 
have also successfully immobilized oligonucleotide probes to the 
surface of polystyrene plates using NaCl-containing buffers and 
used those in hybridization-based assays (26). 

In the current experiments, we have tested different compounds 
in the immobilization process and found that a number of 
chemically divergent reagents are capable of promoting this 
process. For example, the efficiency of immobilization of 
oligonucleotide # 1 1 12 using cetyltrimethylammonium bromide 
(CTAB) and tetramethylammonium chloride (TMAC) was 
compared (Figure 4). As negative controls, the oligonucleotides 
were added to some of the wells as aqueous solutions, without 
any immobilization reagents. To assess the immobilization 
process, following an overnight incubation with the 
immobilization reagents, the biotinylated oligonucleotide # 1676, 
which is complementary to #1112, was added to the wells of 
the microtiter plate at a range of different concentrations. The 
amount of this biotinylated probe captured by hybridization to 
the immobilized oligonucleotide #1112 was determined by an 
enzyme-linked assay for the biotin residue. The results of this 
experiment are represented graphically in Fig. 4. 

The results shown in Fig. 4 as well as similar results obtained 
with other compounds suggest that the immobilization reagents 
can be divided in two groups. The first group consists of 
chemicals like NaCl and TMAC, which work best when used 
at relatively high concentrations, generally higher than 50 mM, 
and best at 250 to 500 mM. Even concentrations as high as 1 
M can be used without any noticeable adverse effect on the 
immobilization. The second group of immobilization reagents 
consists of chemicals that are characterized by the presence of 
two structural features: a positively charged *head* and a relatively 
hydrophobic *tail\ These are the typical features of cationic 
detergents. Representatives of this group are the cationic detergent 
cetyltrimethyl ammonium bromide (CTAB), ocryldimethylamine 
hydrochloride, and l-emyl-3-(3-dimethylaminopropyl)carbodi- 
imide hydrochloride (EDC). These compounds can be used for 
oligonucleotide immobilizauon at very low concentrations, as low 
as 0.03 mM for CTAB, but a lower hybridization signal is 
obtained when they are used at higher concentrations. The 



Figure 3. Scaner plot representation of the results from typing of 88 different 
thoroughbred horses with respect to the diallelic polymorphic locus JH261-1 . The 
resultsshown are those from endpoim readings of the plate taken after 24 min 
of incubation with the enzyme substrates for the two antibody conjugates. Horse 
samples are indicated by dots while the control wells are plotted with ' + '$. The 
points fall into four categories: 1) high values for C and low values for T; 2) 
high values for T and low values for C; 3) high values for both alleles; 4) high 
values for neither allele. The controls run in duplicate were: a) synthetic 
oligonucleotide molecules which mimic the PCR template for this locus and which 
possess a G at the variable position ( + 's found in group # 1); b) which possess 
an A at the variable position ( + 's found in group #2); c) a mixture of these 
synthetic oligonucleotide templates (+ 's found in group # 3); and d) PCR reactions 
to which no horse genomic DNA had been added as the amplification template 
(+*s found in group #4). 



Table 2. Typing of a single nucleotide polymorphism in equine DNA (locus 



JH261-I) 


Genotype 


Value AP 


(SD) 


Value HRP 


(SD) 


CC 


1.478 


0.190 


0.051 


0.009 


CT 


1.044 


0.165 


0.546 


0.121 


TT 


0.155 


0.012 


1.143 


0.186 


NS 


0.194 


0.140 


0.097 


0.075 



Endpoint readings were taken after 24 min of incubation with the colorimetric 
substrates. The average signals (in OD units at 405 and 450 nm) and corresponding 
standard deviations (SD) for 88 different horses are shown. (NS, no signal). 
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inhibitory concentrations differ among the reagents of this group. 
For CTAB, it is as low as 0.5 mM, whereas for EDC it is about 
500 mM. It should be noted that the critical micelle concentration, 
CMC, for CTAB is about 1 mM. Thus, it is possible that once 
micelles are formed, the immobilization is inhibited. Compounds 
of a similar structure, but with a negatively charged 'head' (e.g., 
SDS) are completely inactive as oligonucleotide immobilization 
reagents, as are nonionic detergents. 

The mechanism of immobilization in the presence of EDC or 
cationic detergents is probably very similar to the mechanism 
of transfer of nucleic acids and proteins through an organic phase 
in the presence of detergents described recently (27). 

In another experiment, radioactively labeled oligonucleotides 
were immobilized to polystyrene plates using EDC. The amount 
of immobilized oligonucleotide was then determined by counting 
the amount of radioactivity released upon dissolving the wells 
in toluene. It was thus determined that approximately 1 pmole 
of oligonucleotide is immobilized in each well, which corresponds 
to about 10% of the input oligonucleotide (10 pmole). We also 
found that the input of oligonucleotide can be reduced to about 
3 pmole per well before there is a noticeable decrease in the 
amount of immobilized material. 

PCR amplification, generation of single-stranded DNA 
templates and their capture by hybridization to the GBA 
primers 

PCR normally produces double-stranded products which do not 
hybridize to the immobilized capture oligonucleotide without prior 
strand separation. This strand separation can be achieved by 
treatment with heat or alkali, but we found the efficiency of 
hybridization to be low even with such a denaturation step. 
Asymmetric PCR has also been used for the generation of single- 
stranded products (28). Unfortunately, asymmetric PCR generates 
single-stranded products only linearly, and we found the results 
to be variable. Previously, we have reported a new and efficient 
method for the generation of single-stranded PCR products 
following a regular exponential amplification (25). The method 
is based on the selective protection of one of the strands of the 
PCR product from enzymatic hydrolysis of T7 gene 6 exonuclease 



by the incorporation of four phosphorothioate bonds into the 5' 
end of that strand using modified PCR primers. The exonuclease 
method generates single-stranded products with high efficiency; 
and they are ideally suited for the subsequent hybridization to 
the immobilized primer. \ 
The optimal length of the GBA primers immobilized in the ' 
microtiter plates appears to be between 20 and 25 bases. - 
Oligonucleotides shorter than 20 bases usually give lower signals, ? 
and virtually no signals are seen with primers shorter than 10 
bases. Apparendy, parts of the immobilized oligonucleotides are : 
inaccessible for hybridization because they are involved with 
interactions with the solid phase. This is supported by the finding 
that some 'hybrid 1 25 mer primers that contain only about 12 
to 15 bases at the 3 ' end exactly matching the template give signals 
as strong as those seen with completely matching 25 mers. 
Primers longer than 30 bases produce only slightly better 
extension signals, but tend to be more prone to template- 
independent extension (see below). 

The solid phase primer extension reaction 
In the enzymatic primer extension step, a single dideoxy- 
nucleoside triphosphate is incorporated at the 3' end of the 
immobilized GBA primer. The nature of the nucleotide at the 
polymorphic site of the template determines which of the four 
ddNTPs contained in the extension mixture will be incorporated 
by the polymerase. We have found that both the modified 17 
DNA polymerase (Sequenase) and the Klenow fragment of Exoli 
DNA polymerase I are suitable enzymes for the primer extension. 
Both polymerases assure a very high signal-to-noise ratio. The 
Klenow polymerase possesses a 3' -5' exonucleolytic activity 
which Sequenase lacks (29). This exonuclease activity did not 
cause problems in the template-directed extension, but we have 
encountered cases of template-independent extension that are the 
result of this activity (see below). 

During the development of GBA, we have found that false 
positive results can be generated by three different mechanisms. 
The first of these is trivial, and consists of self-extension at the 
3' end of the template DNA at the same time as the 3' end of 
the immobilized GBA primer is being extended. This source of 




Figure 4. Comparison of CTAB and TMAC as oligonucleotide immobilization reagents. Oligonucleotide #1112 was immobilized to Immulon 4 plates using varying 
concentrations of these two reagents, and then hybridized to the complementary biotinylated oligonucleotide # 1676. used at concentrations of 31. 62. 125. and 250 
ftnole per well. The signals obtained in the colorimetric assay are given in mOD J50 /min. 
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ffioise* is eliminated simply by briefly washing the plates after 
be polymerase extension step with a 0.2 N NaOH solution. 
IfeThe second source of false positive signal, template- 
fdependent extension of the GBA primers, is the most likely 
hroblem to be encountered during the development of a GBA 
or a new polymorphism. This is the result of the formation of 
Hater- or intramolecular secondary structures by the immobilized 
%GBA primers and their subsequent enzymatic extension. Table 
summarizes the results seen in the template-independent 
ension of two GBA primers, # 308 and # 680, as well as some 
Wof their modified versions. 

Hr A typical example of a primer showing template-independent 
fextension is oligonucleotide #308. In this experiment, four 
^separate wells were used to characterize the extension reaction. 
Uln each, only one of the ddNTPs was labeled with biotin while 
he other three were unlabeled. When this primer was 
^immobilized on a plate, the extension reaction produced a strong 
Rsignal due to incorporation of C. Analysis of the sequence of 
^ this oligonucleotide shows that it might be able to form relatively 
Instable inter- or intramolecular partially self-complementary 
^structures. These are shown in Figure 5. In both structures, the 
£ highlighted G residue will dictate the incorporation of a C by 
lithe polymerase. 

To test whether these structures could explain the template- 
§] independent signal, a modified version of this primer was 
synthesized where the G residue of the original sequence was 
replaced by a T. This modified primer, # 308T showed a strong 
^template-independent extension signal in A. In another modified 
version of the same oligonucleotide, # 308M 1, the G residue of 
If the original sequence was replaced by the abasic C 3 linker 



Table 3. Template- independent extension of primers # 308, 680, and their 
modified versions 



primer ft 


base G 


base A 


base T 


base C 


308 


0.2 


1.3 


0.1 


21.5 


308T 


2.0 


48.5 


0.2 


0.2 


308M1 


0.5 


0.3 


0.9 


0.5 


680 


0.4 


1.2 


0.5 


65.5 


680T 


0.2 


58.0 


0.7 


1.0 



Extension reactions were done with the KJenow polymerase. The signals for the 
four different bases are given in mOD 450 /min. 



#308: 



AGCCTCCGACCGCGTG 
III I G 
TGGTCCGT 

AGCCTCCCACCGCGTGGTGCCTGGT 
111 I I III 

TGGTC CGTGGTGCGCC ACCCTCCG A 5* 



G AG ATGC AGCTC TAAGTGCTGTGGG 

mi it n 1 1 1 1 

GGGTGTCGTGAATCTCGACCTAGAG 



Figure 5. Postulated secondary structures of two GBA primers that could lead 
to the observed template-independent extensions. The nature of the incorporated 
nucleotides will be determined by the highlighted bases. 



#680: 



OPO3CH2CH2CH2 using a commercially available phosphor- 
amidite. The modified primer #308M1 did not show any 
template-independent extension. The presence of the abasic linker 
within the sequence of 308MI did not affect its hybridization 
and extension to synthetic or PCR-derived templates. 

Primer # 680 is an example of template-independent extension 
that is dependent on the polymerase used for extension. With 
this primer, a strong template-independent extension by a C was 
seen only when the extension was carried out with the Klenow 
polymerase (see Table 3), but not when the extension was carried 
out with the modified T7 DNA polymerase (Sequenase). It is 
reasonable to assume that this oligonucleotide forms the self- 
complementary structure shown in Figure 5. The four non-base 
paired residues are cleaved off by the 3' —5' exonuclease of the 
Klenow polymerase which then inserts a C opposite the 
highlighted G. When we replaced this deoxyguanosine residue 
by a thymidine in the modified primer #680T, the template- 
dependent noise was changed to an A, as expected (Table 3). 
Because of the possibility of template-independent extension of 
the GBA primers, each new GBA primer should be tested for 
this type of extension before being used in typing experiments. 
This phenomenon would produce inappropriate typings in some 
percentage of cases especially when PCR yield has been low. 
Commercially available computer programs for DNA analysis 
(e.g., *01igo\ National Biosciences, Inc., Plymouth, MN) can 
be used to predict potential secondary structures. The required 
modifications can be incorporated in the sequence of the 
appropriate GBA primer. 

The third source of false positive signals is template-dependent, 
but in contrast to the first type of * noise' described above, it is 
the result of extension at the 3' end of the GBA primer and not 
the template. Signals of variable strength are sometimes generated 
in one of the 'wrong' bases, i.e., a base that is not consistent 
with the sequence of the template to be typed. We have observed 
the same 'noise' profile for both polymerases tested, Sequenase 
and Klenow. This type of * noise' is notably dependent on the 
amount of template molecules hybridized to the GBA primer and 
can be especially serious when high concentrations (usually, more 
than 500 fmole) of synthetic template molecules are used in GBA. 
In the majority of cases when PCR generated templates are typed, 
this type of noise is undetectable or very weak (signal-to-noise 
ratios: > 20), but on rare occasions can be strong enough to cause 
false interpretation of the genotyping results. A summary of the 
results of some GBA experiments with template-dependent 'noise* 
is given in Table 4. 

The biochemical basis of this type of 'noise' is uncertain, but 
it may be the result of mishybridization of the GBA primer to 



Table 4. Analysis of template-dependent noise 



GBA experiment 


base G 


base A 


base T 


base C 


713+1302 


12.0 


35.1 


105.4 


35.5 


713 + 1473 


60.4 


50.8 


160.0 


2.5 


713+1474 


23.4 


64.7 


120.4 


2.5 


501 + 1464 


5.5 


12.4 


85.0 


1.0 


(in solution) 










501 + 1464 


7.0 


35.5 


170.2 


2.0 


(solid phase) 











Signals are given in mOD 450 /min. 500 faiole of the synthetic templates was used 
in the solid phase extension experiments involving primer #713. Incorporation 
of a labeled T was expected in all of these experiments. 



4174 Nucleic Acids Research, 1994, Vol 22, No. 20 



the template and/or misincorporation of the wrong ddNTP by 
the polymerase. Indeed, the polymerases used may in some 
sequence contexts display a higher misincorporation rate with 
the labeled ddNTPs used. We have analyzed in more detail the 
template-dependent 'noise* observed with the GBA primer #713 
and the synthetic template # 1302 (see Table 4). In GBA, this 
template-primer combination gives significant template-dependent 
'noise' in bases C and in A. We synthesized and tested two 
modified templates, # 1473 and # 1474, which differ from 

# 1302 only by a few bases at the 5' end, i.e., in a part of the 
template that is not expected to hybridize to the GBA primer. 
In oligonucleotide # 1473, three deoxyguanosine residues of 

# 1302 are changed to deoxycytidines; in oligonucleotide # 1474, 
the part of the sy nthetic template extending beyond the GBA 
primer is reduced to one single deoxy adenosine. The results 
shown in Table 4 demonstrate that the template-dependent 'noise* 
is influenced by the sequence surrounding the residue of the 
template DNA that directs the dideoxy nucleotide incorporation. 
Thus, the replacement of the deoxy guanosines of # 1302 with 
deoxycytidines in # 1473, or their elimination in # 1474, reduces 
the 'noise 1 in C and, in the case of template # 1473, increases 
the noise in G. Such context-dependent effects on the fidelity of 
DNA polymerases have been reported before (30). 

To verify mat this 'noise' is not due to the fact that the 
hybridization and extension reactions are carried out on the solid 
phase, an extension experiment was carried out in solution (see 
Experimental). The 5' biotinylated oligonucleotide # 1464 was 
used as a primer and annealed to the oligonucleotide #501. The 
expected signal in this template -primer combination is a T. In 
parallel, this typing experiment was carried out on the polystyrene 
solid phase, by immobilizing the primer #501 and using 500 
fmole of oligonucleotide # 1464 as the template. These two 
experiments produced a remarkably similar signal-to-noise 
profiles (see Table 4). Analogous results were obtained with other 
primer-template combinations (not shown). 

Although we have not found a general solution for eliminating 
the rare occurance of template-dependent 'noise', the following 
approaches have been found to reduce or eliminate the problem. 
Hybrid GBA primers of the type 5' X l2 N a where each X 
position contains equal amounts of the four bases whereas the 
13 bases at the 3' end match exactly the template were sometimes 
found to give better signal-to-noise ratios than completely 
matching 25 mers. This could be due to reduced or eliminated 
mishybridization with these primers. The signal-to-noise ratio was 
also improved by performing the extension reaction at 5°C 
rather than at room temperature, and also by decreasing the 
concentration of all ddNTPs in this step. These two factors 
probably affect the fidelity of the polymerase. Finally, this 'noise* 
can usually be avoided by switching the primer protected from 
exonuclease digestion and typing the same polymorphism on the 
opposite DNA strand with a suitable GBA primer. 

Colorimetric detection of the incorporated labeled nucleotide 
As shown above, it is possible to type single nucleotide 
polymorphisms by GBA by including only one labeled ddNTP 
per well. However, the use of two labeled ddNTPs allows the 
determination of both alleles in a diallelic locus to be carried out 
in the same well. This not only reduces the amount of PCR 
generated template required and results in considerable savings 
of labeled chain terminators, but serves as a very useful internal 
control for all post-PCR steps. For example, for a particular 



template, a blank well in the one-base-per-well mode could be 
due to homozygosity of the other type but also to failure of one 
of the post-PCR steps (hybridization, extension). In the two-bases- 
per-well mode, lack of signal for one of the bases can only be 
due to homozygosity of the other type or failure of the enzyme- 
linked assay for this allele. The latter hypothesis can be routinely 
excluded with suitable controls. 

To date, we have found no randomly discovered site of single 
nucleotide polymorphism to be tri- or tetra-allelic. This is 
consistent with findings in other laboratories (Deborah Nickerson, 
personal communication). However, if such a site were to be 
encountered., it could be typed using a second well whereby 
incorporation of the other two nucleotides could be examined 
through inclusion of their ddNTP derivatives. Alternatively, other 
haptenated ddNTPs could be used simultaneously assuming an 
appropriate antibody -enzyme conjugate was available. Failure 
to test for the alleles which are rare in the population can produce 
incompatible data in legitimate pedigrees because heterozygotes 
where one chromosome possesses a 'null' allele would appear 
to be a homozygote for one of the tested alleles. A falsely typed 
heterozygous parent would appear to be excluded if its offspring 
inherits the 'null' allele. This explanation can be considered likely 
if many single-locus exclusions are observed for a particular locus 
in a panel of markers used for genetic studies. 

CONCLUSION 

Compelling arguments exist for the development of DNA-based 
assays, which can be performed on a very large scale, for the 
analysis of known polymorphisms in complex genomes. In this 
paper we give biochemical details concerning a new genotyping 
procedure, GBA (genetic bit analysis), that is simple, convenient, 
and automatable. In this method, sequence-specific primer 
annealing is used to. select a unique polymorphic site in a nucleic 
acid sample, and interrogation of this site is accomplished via 
the highly accurate DNA polymerase reaction using a set of novel, 
commercially available , non-radioactive dideoxy nucleotide 
analogs. . 

An important feature of GBA is that it does not rely on nucleic 
acid hybridization for purposes of nucleotide discrimination. This 
is in marked contrast to methods such as allele-specific 
hybridization (12). Rather, nucleic acid hybridization is used only 
to position the template to be typed with respect to a primer 
molecule. Nucleotide discrimination is accomplished using a 
DNA polymerase, an enzyme that has evolved to perform that 
role. As a result, all GBA reactions can be performed under a 
standard set of conditions that have been optimized for high 
throughput operations. Many loci can be tested simultaneously, 
and new tests can be developed rapidly. 

GBA was developed to be a method that can be applied on 
a very large scale using commercial liquid handling devices. In 
our parentage verification assay for thoroughbred horses, as many 
as 180 horses are typed by GBA at 26 diallelic loci by a single 
technician in a single day. However, GBA can easily be carried 
out manually. The signals generated are usually strong enough 
to allow visual interpretation of the results (Figure 2). without 
the need for a spectrophotometer. Alternatively, data can be 
acquired on a large scale using high-volume, stacking plate 
readers. Therefore, it has the potential to become a method for 
the typing of single nucleotide polymorphisms across a broad 
spectrum of applications. 



I 
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n electrophoretic mobility fOrita et at 1989) In t^ f 
techmque sample DNA is digested with a restriction 
enzyme, denatured, and subjected to polya™ am de" 
gel electrophoresis under nondenaturing'corTdiSns 

IlnXbrT™ " tHen dete « ed ^tef electroblot-' 
Sil: bndi f" 10n » ■ P^be. Under nondenaturing 
matfon rh "."^"ded DNA has a folded confor 

Cor tTe ^ * ' attUtnad -fractions. 
v.JnsequentJv, the conrormation. and therefore rh» 

shSIn ou" de T? nt 00 thC SeqUen «- As theL bi t 
ct n"s "'Ted h PrCSUmab, - v dete «* conformational 
tV,^ , by sequence alterations, we named the 

(Tsc n praUiy S t Strand C ° nf0rmati0n P°'~ h m e 

tion of mni h0W 3 SimP ' e 3nd r3pid method f °r detec- 
c eotide^K SeqUenCe ChangeS ' incIudin * sin^e nu- 
examfn;H tUtl ? nS " In this method ' se ^ences to be 
exarmned are amplified and labeled by the polymerase 
chain reaction (PGR) (Saiki etai, 1988) us^glabe »d 

'atioTan? ■ ^ nUde ° tide ' f ° ,,0W * d * *™ - 
SSCPanf. ?n Ph0reSiS f ° r SSCP ana, - v ^ (PCR- 
alleh! ? yS,S) ' u S mg thiS techni ^ we id ««fied new 
m otm P anod 0rPhlSmS " MU rCPeatS 3t 

MATERIALS AND METHODS 

■Cell Lines 

HtJIS,"™? ° ma 06,1 Hne A549 ' fibrosarcoma cell line 
" 7' - a ' Promyelocytic leukemia cell line HL60 
were obtained from the Japanese Cancer Research Re- 



874 



PCR-SSCP ANALYSIS OF POLYMORPHISMS 



sources Bank. Colon carcinoma cell line SVV480 was 
' from the American Type Culture Collection. 

DNA Isolation 

DNA was prepared from leukocytes, placentas, or 
cwtured cells by the method of Blin and Stafford 
(19 «o). 
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PCR-SSCP Analysis 

Oligonucleotides were synthesized by the phospho- 
roamidite method with a 380A DNA synthesizer and 
purified w,th OPC columns (Applied Biosvstems) The 

rm " a " d A™ qUenCeS 0f the P fim « s us ed here 
i a ^\^?'J ? GTGAA ACCTGTTTGTTGGA; NA6*> 
! ATACACAGAGGAAGCCTTCG; KA12. GGCCTGC ' 
TS A ^ ATGACTGA; KA13 - GTCCTGCACCAGTA- 
ATATGC; F41. CCACATGGAGTCTTCATAAT F4^ 
CCCCAGGAGTACTTATTTTA; ABG3, AAGTT- 
GATGCTGGATAGAGG; ABG4 ATTCTPTTT a * 

rr?rf A ^ G ' ADE " ^GGCaIgSaISg-" 

GTACA; ADE8, CTCTGCATCAGAGAGGGACA- 

™rS CCTATrAGGGAG: and 
L-CTGAGGCACATTAAGACAT. 

r T4 e , 5 '^" d f ° f primers ( 100 Pmol) were labeled with 
IT- PJATP (oO pmoi, 7000 Ci/mmol. ICN) and poly- 
nucleotide kinase (5 U. Boehringer-Mannheim) in i 0 

DTT a o , C for 30 mm. The PCR mixture contained 
10 pmol each of the labeled primers (kination products 
added wuhout purification). 2 nmol each of the four 
deoxynudeotides, 0.1 Mg of genomic sample DNA. and 
•J.-O U of Taq polymerase in 10 M l of the buffer specified 
m the GeneAmp kit (Perkin-Elmer Cetus) In some 
cases, 1 ,1 of [a-»P,dCTP .3000 Ci/mmol. 10 md/ml 
Amersnam) and primers wichout the label were used 
-or the amplification. Thirty cycles of the reaction at 
J4. oo. and ,2°C for 0.5. 0.5, and 1 min, respectively 
vere run in a Thermocycler (Perkin-Elmer Cetus) \ 
)ortion or the reaction mixture (1 M l) was withdrawn 
S?™™ J Ml ° f NaDod SO, and 10 mM 
"He** 0f this soluti °n was mixed with 2 ul 

•f 95 /„ formamide, 20 mM EDTA, 0.05% bromphenol 
lue and 0.05% xylene cyanol, heated at 80°C and 
pplied (1 Ml/lane) to a 5 or 6% polyacrylamide gel (20 
• 40 X 0.03 cm, 0.5 cm per lane) containing 90 mM 
ris-borate, P H 8.3, 4 mM EDTA, and 10% glycerol 
hen specified. Electrophoresis was performed at 30 
' for 1-6 h with cooling using a. fan. We used a hair 
ner (with heater turned off) and an aluminum plate 
itached to one side of the glass plates for efficient and 
'en cooling. The gel was dried on filter paper and 
:posed to X-ray film at -80°C for 0.5-12 h with an 
tensifying screen. 



Direct DNA Sequencing 

Nucleotide sequences were determined usine the 
asymmetric PCR method (Gyllensten and Erlich 1988) 
with slight modifications. An unequal molar ratio (10 
to 1) of the primers was used in 50 cycles of the PCR 
l he amplification mixture (100 „l) % was diluted with "> 
ml of water and concentrated to tHe original volume 
using a Centncon 30 microconcentrator (Amicon) Af- 
ter annealing to a 5'-labeled primer was performed, the 
sequencing reactions were carried out using the ter- 
mination mixtures of a Sequenase kit (USB) and an- 
alyzed on a 6% polyacrylamide gel containing 7 M urea. 

RESULTS 

PCR-SSCP Analysis of- Point Mutations 

The PCR technique can efficiently amplify DNA 
segments of known sequences starting from total ge- 
nomic DNA (Saiki et aL. 1988). The amplified products 
are usually electrophoresed in gels and detected either 
by staining or by hybridization after blotting. We found 
that the products can be detected rapidlv when labeled 
primers are included in the chain reaction. Electro- 
phoresis of the reaction mixture in a thin polyacryl- 
amide gel and direct autoradiography of the gel enabled 
the detection of the amplified sequence in a short time 
(Hayashi et aL, 1989). We next asked whether the la- 
beled amplified DNA fragments could be analyzed for 
their possible sequence changes by the SSCP method 
that we reported previously (Orita et al, 1989). 

The first example is the detection of' activation by 
point mutations in NRAS of HL60 (Boss et aL, 1984) 
and HT1080 (Brown et al., 1984) cells. Both cell lines 
carry mutations in codon 61 (HL60, CAA to CTA 
HT1080. CAA to AAA). Other than at these positions 
their sequences are identical to those of the normal 
gene (data not shown). Primers NA61 and/or NA62 
(see Materials and Methods for their sequences) were 
labeled at the 5'-end and included in the mixture for 
the PCR to amplify a 103-bp fragment that encom- 
passes the codon. The products were denatured and 
^applied to a nondenaturing polyacrylamide gel. Label- 
ing of one (Figs. IB and 1C) or both (Fig. 1A) of the 
primers facilitated identification of the bands that were 
complementary to each other. In HT1080 DNA, four 
bands could be detected (Fig. 1A, lane 1) when the 
PCR was carried out in the presence of two labeled 
primers. The mobilities of two of the four bands were 
identical to those of the complementary strands of the 
fragment from the normal sample (Fig. 1, lane 2). These 
results indicated that the DNA from HT1080 had two 
different alleles of the NRAS gene, one normal and the 
other mutated. Figure 1 also shows that DNA from 
HL60 cells contained only the mutated NRAS allele 
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FIG. 1. PCR-SSCP analysis of point mutations in exon o of 

« R ?L B ? h iA) ° r Cither (B and C) 0f the P fimers Asking exon 
2 of NRAS were labeled at the o'-end with »P and included in the 
PCR containing 0.1 M g of genomic DNA from HT1080 cells (lane 
1), normal leukocytes (lanes 2 and d), and HL60 cells (lane 3) The 
amplification products were denatured, applied to a 6% polyacrvl- 
amide gel (5 mm per lanej, and electrophoresed at 4°C In lanes d 
DNA was loaded without denaturation. The exposure for autoradi- 
ography was 3 h at -80°C with an intensifying screen 



(lane 3). Furthermore, the mobilities of the single- 
stranded fragments from the HL60 cells were different 
from those of the mutated allele of HT1080 cells (lanes 
3 vs 1). This implies that the two different mutations 
can be distinguished by PCR-SSCP analysis. 

Next we examined whether point mutations of exon 
1 in the KRAS2 gene could be detected bv PCR-SSCP 
analysis. The lung carcinoma cell line A549 ( Valenzuela 
and GrofTen, 1986) and the colon carcinoma cell line 
SW480 (Capon et ai, 1983) have been reported to con- 
tain mutations in codon 12 (A549 GGT to AGT 
S W480, GGT to GTT). In this experiment we labeled 
both primers, which spanned a 162-bp segment con- 
taining the codon. Figure 2 shows that the mobilities 
of the separated strands of A549 (lane 1) and SW480 
(lane 2) DNAs can be distinguished from those of the 
normal sample (lane 3). Moreover, none of the cells of 
the two lines had the normal allele. These results were 
confirmed by direct DNA sequencing of exon 1 of the 
KRAS2 gene in these cells (data not shown). 

Conditions That Affect Mobility Shifts 

The conformation of single-stranded nucleic acid is 
presumably determined by the balance between ther- 
mal fluctuation and weak local stabilizing forces such 
as short intrastrand base pairings and base stackings 
Thererore, changes in environmental conditions Such 
as temperature and the presence of denaturant are 
.ikely to cause a change in conformation, which can be 
detected in SSCP analysis as an alteration in mobility 
rhat this is indeed true is demonstrated in Fia 2 where 
he temperature (A vs B) and the presence of glycerol 
A vs C) during electrophoresis are shown to affect the 
nobilities of separated strands. Yet under all condi- 
ions, shifts in mobility by mutations are evident. Also 



the pattern of separations is perfectly reproducible un 
der each condition. In rare cases (6 sequences of an" 
proximately 80 examined; this paper and our unpub 
hshed data), we observed minor faint bands in the lanes 
where denatured samples were loaded but not in the 
lanes where the samples were electrophoresed without 
denaturation (e.g., Figs. 2A and 2B). The relative in 
tensities and the mobilities of these bands varied dt 
pending on the conditions of electrophoresis. We be 
heve that these bands are different conformers of the 
same sequence and that more than one metastable 
conformation is sometimes allowed in a given environ- 
mental condition. 

PCR-SSCP Analysis of DNA Polymorphisms 

Given the results described above, we applied PC^- 
SSCP analysis to the detection of genetic polymor- 
phisms. As a model system, we studied the D13S2 locus 
which is on chromosome 13 and closely linked to the 
RB gene (Cavenee et ai, 1984). We have shown that 
several Haelll fragments of the D13S2 locus contain 
polymorphisms (Orita et ai, 1989). One fragment of 
430 bp was subcloned from pQDll, which carried a 
2.3-kb Hindlll fragment of the D13S2 locus (Cavenee 
et ai, 1984), and its nucleotide sequence was deter- 
mined. Primers corresponding to the sequences at both 
ends were synthesized (F41 and F42, see Materials and 
Methods for the sequences) and used for PCR-SSCP 
analysis of four individuals. The results (Fig. 3A) show 
that one individual (lane 1 of panels II and III) was 
heterozygous for this polymorphism. We determined 
the nucleotide sequences of this fragment from samples 
1 and 2 by the asymmetric PCR and direct sequencing 
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FIG. 2. PCR-SSCP analysis of point mutations in exon 1 of 
KRAS. Genomic DNA from A549 cells (lane 1), SW480 cells (lane 
2), and normal leukocytes (lanes 3 and d) was subjected to PCR- 
SSCP analysis using a pair of labeled primers Banking exon ; of 
'KRAS2. as described in the legend to Fig. 2. In lanes d, DNA was 
loaded without denaturation. Electrophoresis was carried out in gels 
(5 mm per lane) containing 10% glycerol at room temperature (A), 
containing 10To glycerol at 4°C (B). or without glycerol at 4° C (C). 
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D13S2 locus. DNAs from leukocytes of four unrelated individuals 
were subjected to PCR-SSCP analysis (A. lanes 1 through 4) as 
d.-scnbed m the legend to Fig. 2. Electrophoresis was carried out in 
a o.b gel (o mm per lane; containing 105 glycerol and run at room 
temperature. The labeled primers used were t I) F41. (II) F42 and 
fill both F41 and F42. See text for the amplified region. The region 
analyzed m f A) was directly sequenced for che samples 1 and * (B) 
The arrow indicates che band corresponding to a nucleotide substi- 
tuted in sample L 



method. The sequence ladder revealed a single nucleo- 
tide substitution in sample 1 at base 166 from the 3'- 
end of the F42 primer (Fig. 3B, arrow). This substi- 
tution was confirmed by sequencing the complementary 
strand (data not shown). 



SSCP Analysis of Alu Repeats 

Alu repeats are present at 10 5 or more copies per 
human haploid (Deininger et ai. 1981). These repeats 
are believed to have spread throughout the genome by 
a reverse trahscriptase-mediated process (for reviews 
see Britten et ai, 1988). Most Alu repeats seem to have 
no functional role, since the corresponding sites of 
nonsimian genomes lack these sequences. We reasoned 
that such sequences should be rich in polymorphisms 
because they are under no apparent selective pressure. 
Using primers that bracket the repeated sequence, 
PCR-SSCP can be used to search for polymorphisms 
within such regions. Primers that have sequences in 
the single-copy regions adjacent to Alu repeats at the 
loci of genes for adenosine deaminase ( Wiginton et al. t 
1988), angiogenin (Kurachi et ai, 1985), and jS-globin 
(Henthorn et ai, 1986) were synthesized and PCR- 
SSCP analysis was performed using [a- 32 P]dCTP as a 
labeled precursor (Fig. 4). Examination of nine unre- 
lated individuals revealed* that these segments were 
highly polymorphic. Heterozygosity was detected for 
four, five, and four of nine individuals at the loci of 
adenosine deaminase, angiogenin, and tf-giobin genes, 
respectively. The direct sequencing confirmed° base 
substitutions in these alleles and that they were in- 
herited according to Mendel's law (data not shown). 

DISCUSSION 

The method of detecting nucleotide sequence poly- 
morphisms described here is simple, fast, and efficient. 
The target sequence is amplified and labeled simulta- 
neously by the PCR using labeled primers or labeled 
deoxynucieotide. Therefore, neither restriction enzyme 
digestion nor hybridization is required. 

As primers, we have chosen sequences with balanced 
composition in the desired regions. Using these primers 
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FIG. 4. PCR-SSCP analysis of the Alu repeats. DNA 
d the Alu repeats in loci of 
gions were (A) ADE7 and A 
:arried out in a 5% polyacrylamide gel ( 



ncornpassed the Alu reoeats in lori nr* «n« ' " " San ! pIe3 of ""related individuals were analyzed using pairs of labeled primers that 

repeats m loci of genes tor adenosme deaminase (A), angiogenin fB) and J-giobin (C). Primers and lengths of the 



irnplified reeions were (A\ APiPT ahpo iqi u . t „ """ ~" «nu u-gjuum w*t. rnmers ana leng 

g were (A) ADE, and ADES. 381 bp: IB) AAG-1 and AAC2. 36! bp: and fC) ABG3 and ABG4. «7 bp. Electroph 
- -I (o mm per lane) contaming 10ft glycerol and run at room temperature. 
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• under the PCR conditions described above, no spurious 
, reaction products were observed, except in a few ex 
ceptional cases. One such case was when primers were 
located in exon sequences of RAS genes (data not 
shown). Perhaps it is advisable to use nonexon se- 
quences as primers to avoid possible unwanted ampli- 
fications of several sequences which might be shared 
among members of the gene families or pseudogenes 
Another exception occurred when a region within a 
cluster of iUu repeats in the ,3-tubulin locus was ex- 
amined (data not shown). In this case, some of the 
runthrough products in the first cycle that ended within 
the repeated sequence may have served as primers of 
the following cycles, and initiated cycles of artifactual 
amplifications. 

The effect of sequence change on electrophoretic 
mobility is unpredictable. It is true that some of the 
sequence change may not appreciably affect the mo- 
b.I.ty. However, we observed mobility shift with SSCP 
analysis (this report and our unpublished data) in all 
12 arbitrarily chosen tumor cell lines that are known 
to contain mutated HRAS. KRAS. or NRAS 

In a typical PCR, up to 10% of the substrates (prim- 
ers and deoxynucleotides) are incorporated into the 
amplified product. Thus, the efficiency of labelin* of 
target sequences in PCR using labeled primers or la- 
be ed nucleotide ,s extremely high when compared to 
the efficiency m Southern blotting experiments in . 
which much smaller portions of the label attach to the 
target sequences by hybridization. Consequently in the 
current method, the time of exposure to X-rav ViTm il 
much shorter than that in the RFLP experiments and 
the entire procedure including exposure time can be 
completed within 24 h. 

The high radioactivity in the target sequence also 
permitted the use of a thin polyacrylamide gel (Sanger 
and Coulson. 1977). With a 0:3-mm gel. a steep vol J 
gradient can be applied without serious Ohmic heatin- 
so that the time required for electrophoresis is short-' 
Z llf ,S impr0Vement is Particularly important since 
the conformation of single-stranded DNA is sensitive 
to temperature. Also, the high resolution of a thin gel 
is obv.ously advantageous for detection of subtle con- 
formational changes in the samples 
The electrophoretic mobility of single-stranded £>NA 

ZZelY I' *' 15 " Str ° ngly d6pende ™ «»» envi- 
ronmental conditions, as can be seen in Fig 2 We 

cammed several different conditions of electrophoresis 

n a search for better resolution in the PCR-SSCP 

malysis. In the analyses of. 12 sequences that contain 

bfferem mutated RAS and several ALU repeats of d f. " 

erent loci, we performed electrophoresis at room em- 

erature or at 4*C in the presence or absence of IQ% 

lycerol. In our experience, mobility shifts caused bv 

ase substitutes of most sequences were best resolved 



when electrophoresis was carried out at room temoer 

ature in the presence of 10% glycerol. 

tu* CR " SSCP anal y sis re °. ui "s prior information on 

Wirh ePr ,\ Sen r " Ve S6qUenCe f0r the desi ^ ^ Primers 
Within h.s limitation, this method may be of cE 

when a large number of samples must be examined* 

such as m linkage analysis of known sequence, S££ 

of activated oncogenes in cancer tissues, or prenatS 

diagnosis for the presence or absence of a P LicZl 

The human genome is believed to contain, on av 
erage, one polymorphism every few hundred base oai™ 
(Botstein et aL. 1980). Therefore, PCR-SSCP analyst 
of randomly chosen regions of several hundred b's* 
pairs is likely to reveal sequence polymorphisms es 
pecially when the regions are apparently nonfunctional 
r- or example, we examined Alu repeat sequences usine 
single-copy sequences bracketing the repeats as prim 
ers, and found that such interspersed repeated se 
quences can be examined by PCR-SSCP analysis With 
other methods that involve hybridization to probes 
analysis of polymorphisms within such sequences is' 
difficult. As expected, we found that three arbitrary 
chosen Alu repeats (in loci of genes for adenosine de- 
aminase, angiogenic and 0-globin) were highly poly- 
morphic. Because of their abundance, the search and 
linkage analyses of polymorphisms in the Alu repeats 
in the vicinity of genes or even in the anonymous se- 
quences using PCR-SSCP may be useful in the con- 
struction of a linkage map of the total human genome. 
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Detection of polymorphisms of human DNA by gel electrophoresis 
as single-strand conformation polymorphisms • CLropnoresis 

(mobility shift of separated strands/pota, mutation/restriction fragment length polymorphism) 
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ABSTRACT We developed mobility shift analysis of sin- 
gle-stranded DNAs on neutral polyacrylamide gel electropho- 
resis to detect DNA polymorphisms. This method follows 
digestion of genomic DNA with restriction endonucleases 
denaturation in alkaline solution, and electrophoresis on a 
neutral polyacrylamide gel. After transfer to a nylon mem- 
brane, the mobility shift due to a nucleotide substitution of a 
single-stranded DNA fragment could be detected by hybrid- 
ization with a nick-translated DNA fragment or more clearly 
with RNA copies synthesized on each strand of the DNA 
fragment as probes. As the mobility shift caused by nucleotide 
substitutions might be due to a conformational chanee of 
single-stranded DNAs, we designate the features of sinele- 
?SS e ? ^ 35 fl^T d c ^ormation polymorphisms 

(RFLPs), SSCPs were found to be allelic variants of true 
Mendelian traits, and therefore they should be useful genetic 
markers. Moreover, SSCP analysis has the advantage over 
RFLP analysis that it can detect DNA polymorphisms and point 
mutations at a variety of positions in DNA fragments. Since 
DNA polymorphisms have been estimated to occur every few 
hundred nucleotides in the human genome, SSCPs may nrovide 
many genetic markers. 



The nucleotide sequences of DNAs in humans are not 
identical m different individuals. Nucleotide substitutions 
have been estimated to occur every few hundred base pairs 
in the human genome (1). Nucleotide sequence polymor- 
phism has been detected as restriction fragment lensth 
polymorphism^ (RFLP). RFLP analysis of famlTy memSeS 
has been used to construct a genetic linkage map of the 
human genome (2, 3), and this analysis has also revealed the 
chromosomal locations of genetic elements involved in he- 
reditary diseases such as Huntington disease (4) aduit 
polycystic kidney disease (5), cystic fibrosis (6-8) Aizhe - 
mer disease (9, 10), and Duchenne muscular dystrophy (11 
12). Thus prenatal diagnosis of diseases such as cystic fibrosis 
is possible with RFLP probes. Recently, RFLP analysis has 
indicated specific loss of heterozygosity at partSSd^ 
chromosomes in cancerous portions of tissues in several 
human cancers, including retinoblastoma, Wilms tumor 
small cell carcinoma of the lung, renal cell carcinoma 
bladder carcinoma, breast carcinoma, meningioma acoustic 
neuroma (see re . 13 for a review), colorecta'ca^noma (14 f 
15), and multiple endocrine neoplasia type 1- or tvne ? 
associated carcinomas (16 17). This loss of heterozygosity 
suggests die involvement of recessive mutation of particuS 
genes in development of these cancers. 

Although RFLPs are very useful for distinguishing two 
aMeles at chromosomal loci, they can be detected only when 
DNA polymorphisms are present in the recognition se 

The publication costs of this article were defrayed in part by page charge 
payment. Th,s article must therefore be hereby marked "JveriL *^ 
m accordance with 18 U.S.C. §1734 solely to indicate this fact ' 



quences for the corresponding restriction endonucleases or 
when deletion or insertion of a short sequence is present in 
the region detected by a particular probe. To identify DNA 
polymorphisms more efficiently, Noli and Collins used a 
simplified method of denaturing gradient gel electrophoresis 
(18) that had been developed by Myers et al. (19). As analysis 
of mobility shift [probably due to a conformational change of 
single-stranded DNAs on polyacrylamide.gel electrophoresis 
(20)] has been used to detect point mutations (21), in this work 
we examined whether the mobility shift of single-stranded 
DNA caused by a single nucleotide substitution could be used 
to detect nucleotide sequence polymorphisms. The results 
indicated that mobility shift analysis is an efficient method for 
detecting DNA polymorphisms and for distinguishing the two 
alleles at chromosomal loci. 

MATERIALS AND METHODS 

Cell Lines. The human bladder carcinoma cell line T24 was 
obtained from the American Type Culture Collection. The 
human malignant melanoma ceil line SK2 was established 
from a tissue that had been maintained in nude mice (22). 

DNA Isolation. High molecular weight DNA was prepared 
from human leukocytes or cultured human tumor cell lines bv 
the method of Blin and Stafford (23). 

Plasmids. Plasmid pNCO106 was prepared by inserting a 
2.9-kiIobase pair (kb) Sac I fragment of the HRAS! gene from 
SK2 cells into pUC19 (24). Plasmid pT22 was constructed by 
inserting a 6.6-kb BamEl fragment of the HRAS1 gene from 
T24 cells into pBR322 (a gift from M. Wiglar, Cold Spring 
Harbor Laboratory). 

Subcloning and Sequencing of DNA Fragments. From 
PNCO106 and pT22, a 371-base-pair (bp) Pst I fragment 
carrying exon 1 and a 29%-bp Pst I fragment containing exon 
2 of the HRASI gene were isolated and subcioned into the 
pGEM-2 vector (Promega Biotec). The nucleotide sequences 
of the subcioned fragments were determined by the dideoxy- 
nucleotide method (25), using Sequenase (United States 
Biochemical) and the SP6 or T7 promoter primer (Promega 
Biotec). 

Analysis of Single-Strand Conformation Polymorphisms 
(SSCPs). High molecular weight DNA (20 }ig) was digested 
completely with restriction endonucleases under thecondi- 
tions recommended by the suppliers. The reaction mixture 
was extracted once with phenol/chloroform (1:1, vol/vol) 
and once with chloroform. After addition of 0.1 vol of 3 M 
sodium acetate, DNA fragments were precipitated from the 

Abbreviations: RFLP, restriction fragment length polymorphism; 

SSCP, single-strand conformation polymorphism. 
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aqueous phase by addition of 2.5 vol of ethanol. Strands were 
separated out by the method of Maxam and Gilbert (20) with 
\ slight modification. DNA precipitates were dissolved in 20 
a\ of denaturing solution (0.3 M NaOH/1 mM EDTA) and 
> then mixed with 3 /xl of 50% (vol/vol) gIyceroi/0.25% xylene 
& cyanol/0.25% bromophenol blue. The mixture was applied to 
r a nc- jera! 5% polyacrylamide gel (20 x 40 x 0.2 cm) with or 
without 10% glycerol in a well of 10 mm width and subjected 
to electrophoresis in 90 mM Tris-borate, pH 8.3/4 mM EDTA 
at 180 V for 12-36 hr at 17°C. DNA fragments in the gel were 
then transferred to a nylon membrane (Hybond-N, Amer- 
, sham) by electrophoretic blotting in 0.025 M sodium phos- 
£ phate, pH 6.5, at 1 A for 2 hr at 4 C C by the procedure 
v recommended by the membrane supplier. The membrane 
£ was then dried and baked at 80°C for 2 hr. Hybridization with 
k 32 P-labeled DNA probes was performed in 50% (vol/vol) 
i formamide/6x SSC (lx SSC is 0.15 M sodium chloride/ 
? • 0.015 M sodium citrate, pH 7.0J/10 mM EDTA/5X Den- 
hard t's solution (lx Denhardt's solution is 0.02% bovine 
serum albumin/0.02% Ficoll/0.02% polyvinylpyrrolidone)/ 
0.5% NaDodS0 4 containing denatured salmon sperm DNA at 
100 MgMJ and 10% dextran sulfate at 42°C for 16 hr. The blots 
were washed twice in 2x SSC/0.1% NaDodS0 4 for 30 min at 
65 3 C and then once in 0.1 x SSC at 65°C for 10 min 
Autoradiography was carried out at -80°C for 2-7 days by" 
exposing the membranes to x-ray film (XAR-5, Kodak) with 
an intensifying screen (Cronex Lightning Plus, DuPont) 

Analysis of RFLP. RFLP analysis was performed as de- 
5;ribed (26), High molecular weight DNA (5 /xg) was digested 
with an appropriate restriction endonuclease and the digest 
was fractionated by electrophoresis in a 0.7% agarose gel 

DNA Probes for Hybridization. Cloned Pst I fragments 371 
and 298 bp long carrying exon 1 and 2 of the normal human 
HRASl gene (27), respectively, were used as specific probes 
for the corresponding exons. The 2.8-kb Hindlll fragment 
isolated from phage 9D11 (28), provided by the Japanese 
Cancer Research Resources Bank, was used as a specific 
probe for the D13S2 locus on human chromosome 13 (29) 
Probes were labeled to a specific activity of 2-10 x 10* 
-pm/^g by nick-translation (30) with [a- 32 P]dCTP (3000 
Ci/mmol; 1 Ci = 37 GBq) as a radioactive substrate. 

RNA Probes for Hybridization. Single-stranded RNA 
probes were prepared by the method of Melton etal. (31) with 
plasmid constructs carrying the fragments used as DNA 
probes m the pGEM-2 vector as templates. RNA synthesis on 
each strand of the templates was carried out with T7 RNA 
polymerase (TOYOBO, Tokyo) or SP6 RNA polymerase 
(Amersham) in the presence of [a- 32 P]UTP as a radioactive 
substrate. Concentration of UTP was adjusted to 500 uM by 
adding the nonradioactive nucleotide (final specific activity 
40 Ci/mmol) to ensure synthesis of full-length RNA copies' 
The hybridization conditions and washing procedures were 
the same as those for DNA probes. ' 

RESULTS 

Mobility Shift by Single Base Substitution. To determine 
A/hether a single base substitution altered the mobility of 
single-stranded DNAs on neutral polyacrylamide gel electro- 
phoresis, we separated Pst I fragments carrying exon 1 or 2 
)f the human HRASl gene, whose nucleotide sequences are 
:nown. In the human melanoma cell line SK2, one of the two 
Jleles of the HRASl gene is known to be activated by ooint 

r?°l 61 in (32) a " d als0 amplified about 

0-fold (33). The human bladder carcinoma cell line T24 has 
een reported to contain only one allele of the HRASl gene 
'hich carries a mutated codon 12 in exon 1 (34, 35) From 
lasmid constructs pNCO106 and pT22, containing the tens- 
ioning allele of the HRASl gene of SK2 and T24 cells 
:spectively, a 371-bp Pst I fragment carrying exon 1 of the 
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gene was isolated and subcloned in the pGEM-2 vector. 
Similarly, a 298-bp Pst I fragment carrying exon 2 of the 
HRASl gene was isolated from the same plasmid constructs 
and subcloned. By determination of the total nucleotide 
sequences of the subcloned fragments, we confirmed the 
single nucleotide substitution at codon 12 in the 371 nucleo- 
tides of the Pst I fragment between the SK2 gene and the T24 
gene (GGC in the SK2 gene and GTC in the T24 gene). The 
nucleotide sequences of the 298-bp Pst I fragments carrying 
exon 2 of the SK2 and T24 genes were also confirmed to differ 
from each other by only one nucleotide in codon 61 (CTG in 
the SK2 gene and CAG in the T24 gene). After denaturation 
in alkaline solution, these cloned Pst I fragments were 
subjected to electrophoresis in neutral 5% polyacrylamide 
gel. The separated strands were then transferred to a nylon 
membrane by electrophoretic blotting and hybridized with 
2 P-Iabeled DNA probes. As shown in Fig. L4, the pair of 
separated strands of the Pst I fragment carrying exon 1 of the 
T24 gene (lane 2) moved slightly faster than those of the SK2 
gene (lane 1). In the case of the Pst I fragment carrying exon 
2, the mobilities of the separated strands of the SK2 gene 
(Fig. L4, lane 3) were significantly different from those of the 
T24 gene (lane 4). Three bands were observed in the sample 
from the SK2 gene. Hybridization with single-stranded RNA 
probes showed that the bands with the fastest and the slowest 
mobilities were from the same strand of the fragment, while 
the middle band corresponded to the complementary strand 
(data not shown). Usually the slowest-moving band was the 
major one from the particular strand and the ratio of the 
slowest and the fastest bands varied depending on the 
conditions of electrophoresis, especially the temperature of 
the running gels. These results suggested that a particular 
single-stranded DNA could take at least two different mo- 
lecular shapes, depending on the conditions of electropho- 
resis. 

In the system containing homogeneous cloned DNA frag- 
ments, we could demonstrate mobility shift of single- 
stranded DNAs due to a single base substitution. To deter- 
mine whether the same mobility shift could be observed in the 
presence of DNA fragments other than a target fragment, we 
digested genomic DNAs from the two tumor cell lines SK2 
and T24 with Pst I and subjected the total digests to 
electrophoresis in neutral polyacrylamide gel after denatur- 
ation. As shown in Fig. IB, the patterns of the separated 
strands of the fragments carrying exon 1 or 2 of the HRASl 
gene from the genomic DNAs were essentially the same as 
those of the cloned fragments. This result indicated that the 
mobility shift due to a single base substitution of a single- 
stranded DNA fragment in total digests of genomic DNA 

A B 




Fig. 1. Mobility shift of single- stranded DNA fragments due to 
a single base substitution. (A) Plasmid clones (2 pg) of fragments 
carrying exon 1 (371 bp) and exon 2 (298 bp) of the HRASl gene from 
malignant melanoma SK2 cells (lanes 1 and 3, respectively) and from 
bladder carcinoma T24 cells (lanes 2 and 4, respectively) were 
digested with Pst I. (B) Total genomic DNAs (20 from SK2 cells 
(lanes 1 and 3) and from T24 cells (lanes 2 and 4) were digested with 
Pst I. 'After denaturation, the fragments produced were subjected to 
electrophoresis in neutral polyacrylamide gel without glycerol. 
Single-stranded DNAs were transferred to a nylon membrane and 
hybridized with the 32 P-labeied DNA probe for exon 1 of the HRASl 
gene (lanes 1 and 2 in A and B) and the probe for exon 2 of the gene 
(lanes 3 and 4 in A and B). 
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could be detected and was not influenced by the presence of 
a large amount of unrelated DNA fragments. 

SSCP Analysis of Human DNA at the D13S2 Locus The 
above results encouraged us to apply the mobility shVt of 
single-stranded DNA due to a single base substitution to 
detection of nucleotide sequence polymorphisms of a partic- 
ular fragment and, as can be done with RFLPs, to distin- 
guishing two alleles at chromosomal loci. As the mobility 
shift might be due to a conformational change of the single- 
stranded DNAs, we designated the polymorphisms detected 
by the method as SSCPs. 

Leukocyte DNA samples from 19 individuals (10 unrelated 
and 9 in two families) were digested with Hae III, and SSCPs 
of the fragments obtained from a region of about 3 kb at the 
D13S2 locus on chromosome 13 were analyzed. When the 
digests were subjected to electrophoresis without denatur- 
ation and hybridized with the 32 P-Iabeled 2.8-kb Hindlll 
fragment as a specific probe for the DJ3S2 locus, five distinct 
double-stranded DNA fragments (Fl to F5 in order of size) 
without any RFLP were observed in all DNA samples The 
results on DNA samples 1 and 2 are shown in Fig 2A as 
examples. In contrast with the double-stranded fragments 
separated strands of the same DNA fragments showed SSCPs 
with considerable frequency. Representative results are 
shown in Fig. 2 B-D. When nick-translated DNA was useS as 
a probe, SSCPs were apparently observed in at least one of 
the four fragments (F2 to F5) in all four DNA samples (Fig 
IB). The mobility shift of one of the strands of fragment F4 
m sample 1 was especially marked. However, the mobility 
shifts of single strands in other fragments were small and 
therefore the difference of the shifts was not clear when both 
strands of the fragments were hybridized with the nick- 
translated probe. To overcome this disadvantage RNA 

C £fll\ ( ^! A } and 2 in Fifi * 2 C and ^ of each strand of the 
DI3S2 DNA fragment were prepared separately and used as 
probes for hybridization. As shown in Fig. 2 C and D with 
either the RNA'l or RNA 2 probe SSCPs were clearly 
detected m all fragments except fragment Fl. In Fig IE the 
alleles distinguished by SSCPs are summarized. SSCPs found 
in fragment F2 by using the RNA 1 probe could distinguish 
alleles with three different mobilities, designated as "slow" 
(s), -fast" (f), and -very fast" (vf). In addition to these three 
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alleles, the SSCP analysis of the other DNA sample show n i n 
Fig. 3A revealed the presence of an allele with "very slow" 
(vs) mobility in the fragment. The SSCPs of the other 
fragments, F3, F4, and F5, could also distinguish at least two 
alleles with -slow" (s) or "fast" (f) mobility. Analysis of 19 
DNA samples revealed that mobility shifts found in F4 and F5 
were coincidental. 

Mendelian Inheritance of SSCPs. To confirm that the 
observed SSCPs of the Hae III fragments of the region at the 
D13S2 locus were due to allelic variants of true Mendelian 
traits, we analyzed the DNAs of nine individuals in two 
related families. In Fig. 3A, SSCPs of fragments F2, F3, and 
F4 and the alleles identified are indicated. In each family, the 
genotypes of the progenies were consistent with the parental 
genotypes. 

Relationship Between SSCPs and RFLPs. The same 19 DNA 
samples analyzed for SSCPs were also subjected to RFLP 
analysis. The DNAs were digested with Msp I or Taq I and 
RFLPs were detected by hybridization with the 32 P-Iabeled 
DNA probe for the DI3S2 locus. Of the 19 DNA samples 
digested with Msp I, five samples (sample 2 in Fig. 2, data not 
shown, samples 2, 3, 5, and 8 in Fig. IB) showed RFLP. By 
Taq I digestion, RFLP was observed in only one of the DNA 
samples (sample 2 in Fig. 2, data not shown). Therefore, 
RFLP analysis revealed heterozygosity at the DI3S2 locus in 
only 5 of 19 individuals, while with SSCP analysis heterozy- 
gosity at the locus was found in at least one of the four Hae 
•III fragments in 18 of the 19 DNA samples. This fact 
demonstrates that SSCP analysis is a superior tool for 
detection of genetic polymorphisms. 

Factors Affecting SSCP Analysis. The mobility shift of 
single-stranded DNAs with DNA polymorphisms observed 
on neutral polyacrylamide gel electrophoresis is most likely 
due to conformational variations of the molecules. The 
conformation of single-stranded nucleic acid is expected to be 
affected by environmental factors such as the temperature of 
the gel during electrophoresis, the concentration of electro- 
phoresis buffer, and the presence of denaturing agents in gels. 
The mobility shift of the Pst I fragments carrying exon 1 of 
the HRASi gene shown in Fig. L4 (lanes 1 and 2) was clearly 
observed on electrophoresis at 17°C but not prominently at 
23°C (data not shown). The pattern of the separated strands 
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Fig. 3. SSCP and RFLP analyses of family members. 04) 
Leukocyte DNAs (20 /xg) from the family members indicated at the 
top (O, females; a, males) were subjected to SSCP analysis using the 
D13S2 probe as described in the legend for Fig. 2. As the mobility 
shifts found in fragments F4 and F5 were the same, the results with 
fragment F5 are not shown. (5) The leukocyte DNAs (5 digested 
with Msp I were subjected to RFLP analysis using 32 P-labeIed 
dsDNA as a probe for the DI3S2 locus. 

of the fragments carrying exon 2 of the gene observed at 17°C 
and shown in Fig. L4 (lanes 3 and 4) was also altered at 23°C. 
Thus, the higher temperature might destroy some semistable 
conformations. The concentration of the running buffer also 
affected the mobility shift. When electrophoresis of the Pst I 
fragments analyzed in Fig. L4 was performed in a buffer of 
lower concentration (45 mM Tris-borate, pH 8.3/2 mM 
EDTA) at 17°C, the mobility shifts observed were similar to 
those at the higher temperature (23°C). Presence of 10% 
glycerol in gels also affected the mobility shift. However the 
effect of glycerol was rather complicated and mobility shifts 
due to DNA polymorphisms were often enhanced by this 
reagent. For example, the mobility shifts observed in Fig. 2 
were enhanced when electrophoresis was performed in gel 
containing 10% glycerol. On the other hand, the mobility shift 
shown in Fig. 1 was reduced by the presence of 10% glycerol 
in the gel. 

DISCUSSION 

By neutral polyacrylamide gel electrophoresis, we could 
separate two single-stranded DNA fragments in which the 
nucleotide sequences differed at only one position. The 
mobility shift due to a single base substitution could be 
observed not only in cloned fragments but also in fragments 
of total genomic DNA after restriction endonuclease diges- 
tion. We applied the method to detect nucleotide sequence 
polymorphisms in human genomic DNA and could observe 
the mobility shift of single-stranded DNA by using a genomic 
sequence probe arbitrarily chosen. Single-stranded DNAs of 



the same nucleotide length can be separated by polyacryl- 
amide gel electrophoresis, probably due to a difference in 
their predominant semistable conformations (20). The mo- 
bility shift of single-stranded DNAs with DNA polymor- 
phisms observed on gel electrophoresis might also be due to 
conformational change, and so we designated the features of 
DNAs as SSCPs. We do not know whether nucleotide 
substitution at any position in a fragment can be detected by 
SSCP analysis, but DNA polymorphisms at a variety of 
positions in a fragment could cause a difference in its 
conformation and result in change in mobility of the single 
strands on gel electrophoresis. Therefore, we thought that 
DNA polymorphism could be detected more frequently by 
SSCP analysis than by RFLP analysis, and our experimental 
results revealed that this was in fact the case. Like RFLP 
analysis, SSCP analysis is simple and does not require 
complicated instruments or specialized techniques. 

As we confirmed that the observed SSCPs were due to 
allelic variation of true Mendelian traits, SSCP analysis of 
DNA fragments could be a useful and simple method for 
elucidating the human genetic linkage map by studies on 
families. Because DNA polymorphisms have been estimated 
to occur once every few hundred nucleotides of the human 
genome (1) and SSCP analysis can reveal nucleotide substi- 
tutions at various positions in a fragment, any restriction 
endonuclease fragment with a nucleotide length suitable for 
strand separation may provide information for distinguishing 
* two alleles. Therefore, in theory, on a nylon membrane 
carrying separated strands of all possible fragments of ge- 
nomic DNA, DNA polymorphisms at any chromosomal 
locus can be detected by repeated hybridization of the 
membrane with a variety of probes. 

SSCP analysis can also be used to locate genetic elements 
involved in hereditary diseases and to detect DNA aberra- 
tions in human cancers. Comparison of DNA fragments from 
cancerous portions of tissues with those from normal por- 
tions by SSCP analysis can reveal amplified alleles of 
particular genes and loss of heterozygosity at particular 
chromosomal loci. A remarkable advantage of SSCP analysis 
is that it can be used to detect point mutations at various 
positions in a fragment. Recently, by means of the DNA 
polymerase chain reaction (PCR), a DNA segment of a single 
cell or a single sperm has been amplified to an amount 
sufficient for analysis by hybridization (36). Our preliminary 
result suggested that SSCP analysis of DNA segments_ am- 
plified by PCR technique could be useful for diagnosis of 
genetic aberrations. 
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of Health and Welfare for a Comprehensive 10-Year Strategy for 
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ABSTRACT 

A novel method that allows direct analysis of single 
base mutation by the polymerase chain reaction (PCR) 
is described. The method utilizes the finding that PNAs 
(peptide nucleic acids) recognize and bind to their 
complementary nucleic acid sequences with higher 
thermal stability and specificity than the corresponding 
deoxyribooligonucleotides and that they cannot 
function as primers for DNA polymerases. We show 
that a PNA/DNA complex can effectively block the 
formation of a PCR product when the PNA is targeted 
against one of the PCR primer sites. Furthermore, we 
demonstrate that this blockage allows selective 
amplification/suppression of target sequences that 
differ by only one base pair. Finally we show that PNAs 
can be designed in such a way that blockage can be 
accomplished when the PNA target sequence is located 
between the PCR primers. 

INTRODUCTION 

A multitude of human genetic diseases result from single base 
mutations in specific genes (1). To facilitate the in vitro analysis 
of such mutations several techniques have been devised. These 
include enzymatic (2) or chemical (3-4) probing of mismatch 
complexes, gradient gel electrophoresis (5), use of nucleotide 
analogues (6) hybridization with allele specific oligonucleotide 
probes (7) and the oligonucleotide ligation assay (8). To enhance 
the sensitivity of these methods the target nucleic acid is normally 
amplified to detectable quantities by the polymerase chain reaction 
(PCR) (9). 

PCR itself has also been used to analyse directly single base 
mutations by using allele specific oligonucleotides as amplification 
primers (10- 12). Unfortunately, the general applicability of this 
approach is limited by the fact that the majority of 
primer -template mismatches have no significant effect on the 
amplification process (13). 



We recendy found that PNA (Peptide Nucleic Acid) is a potent 
DNA mimic in terms of sequence specific hybridization, and 
obtained results showing that at physiological ionic strength 
PNA/DNA duplexes are generally 1 °C per base pair more stable 
thermally than the corresponding DNA/DNA duplexes (14-17). 
Furthermore, our results indicated that the base pair mismatch 
discrimination is greater for PNA/DNA than for the 
corresponding DNA/DNA duplexes. In the special case of 
homopyrimidine PNA, (PNA) 2 /DNA triplexes are formed of 
unprecedented thermal stability and sequence discrimination with 
complementary oligonucleotides. For example, the complex of 
PNA T I0 with dA 10 ~ exhibits a T m of 76°C, with AT m 's for base 
mismatches raneina between 10- 13°C (18). Similarly, T m for 
a PNA (T 4 CT 5 ) 2 /dA 4 GA 5 complex is 79°C, with AT m *s for 
base mismatches ranging between 30-35°C (18). 

Taking advantage of these unique properties of PNA, and the 
fact that PNA cannot function as a primer for DNA polymerase, 
we now report that PNA can be used to block a PCR amplification 
process in a sequence specific mariner. Furthermore, we show 
that the specificity of this approach, termed 4 PCR clamping*, is 
such that two alleles which differ by only one base pair can be 
discriminated. Thus this technique allows for direct analysis of 
single base mutations by PCR. 

MATERIALS AND METHODS 

The PNAs H-T 10 -LysNH->, H-T<CT 4 -LysNH 2 , PNA62 (H-TG- 
TACGTCACAACTA-NH 2) and PNA 176 H-GATCCTGTAC- 
GTCACAACTA-NHi were synthesized as described (15-17). 
The plasmid pTIOKS'was constructed by cloning the comple- 
mentary oligonucleotides 5'-GATCCT l0 G and 5'-GATCCA 10 G 
into the BamUl site of the Bluescript KS + plasmid (Stratagene). 
The plasmid pT9C was constructed by cloning the complementary 
oligonucleotides 5'-TCGACT 5 CT 4 G and 5'-TCGACA 4 GA 5 G 
into the Sail site of pUC19. The plasmid p62-l was constructed 
by first cloning the complementary oligonucleotides 5'-GATC- 
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CTGT ACGTC AC AACTA-3 ' and 5 '-G ATCTAGTTGTGACG- 
TACAG-3' into the BamHl site of pUC19 to obtain p62, followed 
by cloning of a 556bp PstVHindlU fragment from the phage X 
genome into the PstlfHindHl site of p62. The plasmids p62-A- 
KS, p62-T-KS and p62-C-KS were isolated from a mini-library 
constructed by cloning the degenerate, complementary oligo- 
nucleotides 5 '-TCGACTCTAGAGGATCTAGTTGTGANGT- 
ACAG-3' and 5 '-GATCCTGTACNTCACAACTAGATCCT- 
CTAGAG-3' into the Safl/BamHl site of bluescript KS+ 
(Stratagene). The control plasmids pCKS and PCKS-1 were 
Bluescript KS + derivatives which do not contain a target 
sequence for any of the PNAs used in this study. Using standard 
techniques (19) plasmids were isolated from selected clones of 
recombinant E.coli JM103, purified by buoyant density 
centrifugation in CsCl gradients and sequenced by the dideoxy 
method. 

The following oligonucleotide primers were used in the PCR 
reactions: reverse primer (5'-GAAACAGCTATGAC-3') ( 
reverse- 1 primer (5 '-CACACAGGAAACAGCTATGAQ, 
forward primer (5'-GTAAAACGACGGC-3')< forward- 1 primer 
(5 '-GTAAAACGACGGCCAGT). proximal primer (5'-TACC- 
CGGGGATC-3') and primers specific for each of the p62 
plasmids: p62-l primer (5 '-TGTACGTCACAACTA-3 ') , 
p62-A-l primer (5 '-TGTACATCACA ACTA-3 '), p62-A-2 
primer (5 '-CCTGTACATCAC AACTA-3'), p62-A-3 primer 
(5'-ATCCTGTACATCACAACTA-3'), p62-A-4 primer (5'-G- 
GATCCTGTACATC AC AACTA-3 ' ) . p62-A-5 primer (5'-GT- 
GGATCCTGTACATCACAACTA-3'), p62-T primer (5'-GGA- 
TCCTGT ACTTC AC AACTA-3 ') and p62-C primer (5'-GGA- 
TCCTGTACCTCACAACTA-3 '). 

PCR amplifications were carried out in a 50/d volume 
containing 0. lng of each plasmid. 0.2/tM of each primer, 2QOfM 
dNTP and buffer (lOmM Tris-HCl, pH 8.3 (at 25°C), lOmM 
KC1. and 3mM MgCl : ). The PCR reactions were overlaid with 
2 drops of paraffin oil and incubated at 96°C for 2 minutes before 
the amplification process was initiated by the addition of 3U of 
the Stoffel polymerase (Perkin Elmer Cetus) or 1U of the supertaq 
polymerase (AH Diagnostics). When using the supertaq 
polymerase the buffer was changed to (50mM Tris-HCl, pH 
9.0 (25 °C), 50mM KC1, 7mM MgCl 2 , 16mM (NH 4 ) 2 S0 4 and 
0.2mg/ml BSA). Experiments were carried out using either a 
Minicycler™ (MJ Research) amplifier machine or a LEP 
amplifier machine (IgG Biotech). Comparative results were 
obtained independent of the machine and polymerase used. PCR 
cycle profiles and concentrations of PNAs were as indicated in 
the figure legends. 

T m values for PNA/DNA DNA/DNA duplexes were 
determined spectrofotometrically at 260nm in lOmM Na- 
phosphate, 150mM NaCl and ImM MgCl 2 . 

RESULTS 

PNAs can effectively block the formation of a PCR product 
containing a complementary target sequence 

Given the higher thermal stability of a PNA/DNA duplex 
compared to the corresponding DNA/DNA duplex we speculated 
that PNA might be abie to block PCR in a sequence specific 
manner if targeted against one of the PCR primer sites. Clearly, 
for such a blocking mechanism to work the PNA must compete 
effectively against its cognate PCR primer in binding to their 
common recognition site. To facilitate this requirement, the 
normal 3 step PCR cycle was expanded with a distinct PNA 
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Figure 1. Schematic representation of the PCR cycle profile used in PNA directed 
clamping. 
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Figure 2. Experimental setup and result of a PCR clamping experiment in the 
presence of increasing concentrations of PNA62. Lane 1: amplification of the 
p62-l plasmid in the absence of PNA62. Lane 2: amplification of the pCKS control 
plasmid (plasmid containing no PNA62 target) in the presence of 17.8^M PNA62. 
Lanes 3-9: amplification of the p62-l plasmid in the presence of 17.8/xM f3). 
8.9/xM (4), 2.2/iM (5), XAfiM (6). 0.6>M (7), 0.3/iM (8) and O.lS/tM (9) PNA62. 
PCR cycle conditions were 96 °C, 2min-65°C, lmin-40°C, 30sec-60°C. 
2min-30 cycles. 



annealing step which 1) precedes the PCR primer annealing step 
and 2) is set at a temperature that allows only the PNA to bind 
to its target sequence (Figure 1). 

Figure 2 shows the experimental setup and result of a PCR 
clamping experiment in the presence of increasing amounts of 
a 15mer PNA, PNA62 (H-TGT ACGTC ACAACTA-NH 2 ). 
Two plasmid templates were used: the p62-l plasmid which 
directs the amplification of a 640bp fragment containing a PNA62 
target site and the control plasmid, pCKS, which directs the 
amplification of a 246bp non-target fragment. When PNA62 is 
either absent (lane 1) or present at a concentration of 0.15 pM 
(lane 9) the p62-l plasmid directs the synthesis of the expected 
640bp PCR fragment. At concentrations at or above 0.3/xM 
PNA62, however, no product is produced (lanes 3 to 8). The 
absence of product is not due to a non-specific inhibitory effect 
of PNA62 on PCR, since even at the highest concentration used 
(17.8/aM) PNA62 will not inhibit the amplification of the expected 
246bp fragment from the pCKS control plasmid (lane 2). 
Furthermore, the ability to clamp PCR is not the result of some 
unique property of PNA62 since similar results could be obtained 
with other mixed sequence PNAs (data not shown). 

Clamping can be accomplished when the PNA target site is 
located between the two PCR primers 

We next analysed whether the PNA would be able to clamp PCR 
independent of the relative position of the PNA and PCR primer 
target sites. We compared the overlap of PNA and PCR primer 
target sites to the situations where the PNA target site is either 
1) located adjacent to a PCR primer site or 2) located in the 
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Figure 3. Experimental setup and effect on PNA62 directed clamping of changing 
the relative position of the PNA and PCR primer target sites. The 3 '-end of the 
forward primer is located 31 bp downstream of the PNA62 target site and 26bp 
downstream of the PNA 176 target site. The 3'-end of the proximal primer is 
located lbp downstream of the PNA62 target and overlaps the PNA 176 by 4bp. 
The p62-l primer exactly overlaps the PNA62 target site. Lanes 1-3: amplification 
of the p62-l plasmid with reverse and p62-l primers in the absence (1) or presence 
of 17.8/uM of PNA62 (2) or I7.8jiM of PNA 176 . Lanes 4-6: amplification 
of the p62-I plasmid with reverse and proximal primers in the absence (4) or 
presence of 17.8pM of PNA62 (5) or 17.8/iM PNA176 (6). Lanes 7-9: 
amplification of the p62- 1 plasmid with reverse and forward primers in the absence 
(7) or presence of either 17.8/iM PNA62 (8) or 17.8/iM PNA176 (9). PCR cycle 
conditions were 96 °C. 2min-65°C. lmin-40°C. 30sec-60°C 2min-30 
cvcles. 



middle of the PCR region. There are fundamental differences 
in the underlying mechanism of clamping in these three cases. 
When the PNA and PCR primer target sites overlap, clamping 
operates by 'primer exclusion*. Conversely, when the target site 
is located at a distance from the PCR primer sites, clamping is 
expected to operate by preventing read-through by the Taq 
polymerase ('elongation arrest'). Finally, when the PNA target 
is located adjacent to the PCR primer site, clamping is likely 
to operate by either preventing polymerase access to the PCR 
primer and/or by preventing initiation of primer elongation. 

Figure 3 shows the experimental setup and the result of 
changing the relative position of the PNA and PCR primer target 
sites. Using PNA62, clamping can be accomplished efficiently 
when the PNA target site either overlaps (lane 2) or is located 
adjacent to a PCR primer site (lane 4). However, when the PNA 
target site is located at a distance from the PCR primer site no 
clamping is observed after 30 cycles (lane 8) suggesting that this 
PNA/DNA complex is unable to prevent read-through by the 
polymerase. To test whether an extented PNA62, with an 
increased T m for its complementary DNA target, was capable 
of clamping we synthesized PNA 176 (H-GATCCTGTACGTC- 
ACAACTA-NHJ which is complementary to the PNA62 target 
plus the first 5 flanking base pairs in the plasmid. As shown in 
Figure 3, PNA 176 efficiently clamps the PCR process independ- 
ent of the position of the PNA target site; lane 3: overlapping 
PNA and PCR primer target sites, lane 6: proximal PNA and 
PCR primer target sites and lane 9: widely spaced PNA and PCR 
primer target sites. In experiments with another DNA target 
sequence we have found that a shorter PNA than PNA62, with 
a correspondingly lower T m) is succesful in clamping its cognate 
PCR when its target site is located at a distance from the PCR 
primers (data not shown). Thus, it would appear that this 
clamping ability may be a complex function of affinity and kinitics 
of dissociation. 



20 



16 



AT M 12 :: 



8 



4 - 



AGCT A 6 C S T AGCT AGCT 



DNA: 
PNA: 

Pos.: 




GAT- 5 ' 
CTA-NH. 



Figure 4. Schematic representation of the effect on T m of introducing single base 
mismatches in a PNA62/DNA complex. A series of anti-parallel DNA 
oligonucleotides were synthesized which contained different single base mismatches 
to the PNA62 at position 6-9. The value of the fully complementary 
PNA/DNA duplex is 69 °C. The AT m values shown in the Figure indicate the 
reduction in thermostability that results from the introduction of single base 
mismatches in the helix. 



PNA directed clamping can be used to analyse single base 
mutations 

To explore the potential of the method we tested whether PNA62 
would be able to discriminate between fully complementary and 
single base mismatch targets in a mixed target PCR (i.e. where 
both targets are present in the PCR reaction mix). As shown in 
Figure 4 single base mismatches in the PNA62/DNA duplex 
lower the thermostability of the complex by 8-20°C depending 
on the type of mismatch and its position in the duplex. Based 
on these data we chose to analyse mutations at position 6 since 
these mutations span the largest temperature interval and also 
include the mutation that exhibits the least helix destabilizing 
effect, i.e. the PNA G/T DNA mutation (AT m of 8°C). 

The configuration chosen for the point mutation analysis was 
primer exclusion. The PCR reaction mix contained the p62-l 
wildtype plasmid, the appropriate mutant p62-plasmid, primers 
specific for the wildtype and the mutant plasmids and the common 
reverse primer. Figure 5 shows the experimental setup and the 
result of the PNA G/T DNA mutation analysis. In the absence 
of PNA62 two PCR products of sizes corresponding to 
amplification of the p62- 1 and p62-A-KS are produced (lanes 
1, 3, 5, 7 and 9). In the presence of PNA62, however, the 
synthesis of a PCR product is dependent on the size of the mutant, 
primer. When the mutant primer has a size similar to PNA 62. 
addition of the PNA oligomer will suppress the amplification from 
both wildtype and mutant plasmid (lane 2). This is because the 
duplex between the mutant primer and its complementary target 
is less thermostable (T m = 52°C) than the mismatched 
PNA62/mutant duplex (T m =61°C. Fig. 4). However, as the 
size of the mutant primer is increased its ability to compete with 
PNA62 for binding to its target sequence increases. Thus, at a 




Figure 5. Optimization of the size of the mutant primer required to cam' out 
selective PNA62 directed clamping of the wildrype p62-l plasmid in the presence 
of the p62-A-KS single base mutated plasmid. Each reaction contains the p62-l 
and p62-A-KS plasmid. the p62-l primer, one of the p62-A-l to 5 primers, the 
common reverse primer and 8.9^M PNA62. Lanes 1-2: amplification using 
the p62-A-l primer in the absence (1) or presence (2) of PNA62. Lanes 3-4: 
amplification using the p62-A-2 primer in the absence (3) or presence (4) of 
PNA62. Lanes 5-6: amplification using the p62-A-3 primer in the absence (5) 
or presence (6) of PNA62. Lanes 7-8: amplification using the p62-A-4 primer 
in the absence (7) or presence (8) of PNA62. Lanes 9-10: amplification using 
the p62-A-5 primer in the absence (9) or presence (10) of PNA62. PCR cycle 
conditions were 96°C. 2min-65°C. lmin-40°C, 30sec-60°C 2min-30 
cycles. 
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Figure 6. Point mutation analysis with PNA62. Each PCR reaction contains the 
p62-l plasmid and one of the single base mutated plasmids p62-A-KS. p62-T- 
KS or p62-C-KS. the common reverse primer, the appropriate allele specific 
primers (p62-l primer, p62-A-5 primer. p62-T-4 primer. p62-C-4 primed and 
8.9/iM PNA62. Lanes 1-2: co-amplifications of the p62-l and p62-A-KS piasmids 
in the absence (1) and presence (2) of PNA62. Lanes 3-4: co-amplifications 
of the p62-l and p62-T-KS plasmids in the absence (3) and presence (4) of PNA62. 
Lanes 5-6: co-amplifications of ihe p62-l and p62-C-KS plasmids in the absence 
(5) and presence (6) of PNA62. PCR cycle conditions were 96" C. 2min-65 c C, 
lmin-40°C, 30sec-60 c C. 2min-30 cycles. 

mutant primer size of +8 nucleotides (relative to the 15mer 
PNA62) the p62-A-KS plasmid directs the amplification of a small 
amount of PCR product (lane 8) and at a mutant primer size of 
+ 10 this amplification product is readily visible in the gel (lane 
10). Even at a size of +10, however, the mutant primer will 
not prevent PNA62 from clamping its wildtype target as shown 
by the lack of the 640bp band in lane 10. 

Using a similar experimental setup we then determined the 
optimal size for the T-mutant and C-mutant primers. These data 
are compiled in Figure 6 which shows that PNA62 is able to 
carry out selective suppression of its fully complementary 
sequence in the presence of all 3 possible point mutations at the 
position analysed; lane 2: PNA G/T DNA mismatch, lane 4: 
PNA G/A DNA mismatch and lane 6: PNA G/G DNA 
mismatch. 

Clamping with homopyrimidine PNAs. 

In the examples described above the PNAs contained both purine 
and pyrimidine nucleobases. Such mixed sequence PNAs form 
highly thermostable duplexes in a preferred anti-parallel 
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Figure 7. Single base mismatch analysis with two homopyrimidine PNAs 
complementary to a target sequence located at a distance from the PCR primers. 
Each PCR reaction contains the pTIOKS. pT9C and pCKS-1 (control) plasmids 
and PNAs as indicated. Lane 1: co-amplifications in the absence of PNAs. Lane 
2: co-amplifications in the presence of 3.3/xM PNA H-T 10 -LysNH 2 . Lane 3: co- 
amplifications in the presence of 13.2jtM PNA H-T 4 CT5-LysNH 2 . Lane 4: co- 
amplification in the presence of 3.3^M H-PNA-T 10 -LysNH 2 and"l3.2MM PNA 
H-T.CT ; -LysNH : . PCR cycle conditions were 96°C, 2min-62°C, 
3min-40 s C, lmin-65°C, 2min-35 cycles. 



orientation with their target DNA sequence (17). In contrast to 
this binding mode, homopyrimidine PNAs form extremely 
thermostable (PNA) 2 /DNA triplexes with a preference for 
parallel orientation with their target DNA sequence (15-16). 
We therefore also wished to study the PCR clamp technique with 
such triplex forming PNAs. To obtain thermal stabilities 
comparable to the previously described PNA62/DNA duplex we 
chose two lOmer PNAs (PNA H-Ti 0 -LysNH 2 , T m =75°C and 
H-T 5 CT 4 -LysNH : , T m =79°C) which differ from each other at 
a single base position. Both of these PNAs acted as efficient and 
sequence specific clamps in a PCR amplification process and both 
PNAs were very efficient in blocking their cognate PCR process 
independent of the location of the PNA and PCR primer sites 
(data not shown). Furthermore, as shown in Figure 7 both 
homopyrimidine PNAs were able to discriminate between their 
fully complementary and single base mismatch targets when the 
PNA target site is located at a distance from the PCR primer 
sites. In the absence of either of the PNAs three PCR products 
of sizes corresponding to amplification of the pTIOKS, pT9C 
and pCKS-1 control plasmids are visible in the gel (Figure 4, 
lane 1). If PNA H-T 10 -LysNH 2 is included in the PCR reaction 
alone the products corresponding to amplification of the pT9C 
and pCKS-1 control plasmids are seen (lane 2). Similarly, PNA 
H-T 4 CT 5 LysNH : suppresses the amplification of its cognate 
target fragment, whilst leaving amplification of the pTIOKS and 
pCKS-1 control plasmids unaffected (lane 3). In the presence of 
both PNAs only the PCR product corresponding to the pCKS-1 
control plasmid is seen. 

DISCUSSION 

We have used PNA to develop a method that converts a PCR 
amplification process into an efficient analytical tool for the direct 
detection of single base mutations. In our hands the method is 
very robust. The two different modes of PCR blocking by PNA 
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(primer exclusion and elongation arrest) and the two different 
ways in which PNA target recognition can occur (duplex vs. 
triplex), further provide great versatility and flexibility to the 
PNA/PCR clamp system? 

It is interesting to note that clamping can operate efficiently 
even with incomplete binding of PNA to its DNA targets. For 
example, our calculations show that if 1 % of all target sequences 
escape clamping in each cycle the maximum amplification factor 
after 30 cycles is only 9-fold, which will not generally produce 
a detectable signal on a gel. Indeed, in PCR amplifications of 
genomic material we predict that as much as 10% of the target 
sequences can escape clamping without generating a detectable 
signal (equivalent to a maximum amplification factor of 2500 
in 30 cycles). 

In a PNA clamping protocol with mixed sequence PNAs we 
prefer to use the primer exclusion principle for the following 
reasons. First, this clamping mode places the least physical 
demands on the PNA, i.e. clamping does not require that the 
PNA, once bound to its target, is able to prevent read-through 
by the polymerase as is the case in the elongation arrest clamping 
mode. Second, the only variables in the primer exclusion 
clamping mode are the T m of the PNA and the PCR primers and 
these can be tuned to precision simply by changing either the 
sizes of the PNA and PCR primers, or by altering their exact 
position on the target DNA. Third, when using the primer 
exclusion principle there is the further advantage that, in addition 
to blocking its cognate target site, the PNA will compete with 
the PCR primer for any cryptic primer sites in the genome, 
thereby suppressing any occurrence of non-specific background 
in the PCR process directed by this primer. 

In order to target unique sequences in the human genome a 
primer of at least 17bp is usually required, the T m ? s of which 
typically range between 50-60°C. Thus, for the successful 
projection of our PNA clamping approach to the analysis of point 
mutations in the human genome, PNAs with T m 's above 60°C 
must be able to effectively discriminate between their fully 
complementary and single base mismatched target DNA. Using 
a mixed sequence 15mer PNA with a T m of 69°C we have 
shown, in a model system, that three different point mutations 
at a single position can be discriminated, suggesting that PNA 
clamping can be used as an effective diagnostic tool for the 
analysis of mutations in complex genomes. We acknowledge that 
only three out of twelve possible mismatches have been analysed 
and only in a single sequence context. However, we have shown 
in the present study that the PCR clamp system can efficiently 
discriminate the most difficult case in our system (PNA G/T 
DNA mismatch with a AT m of only 8°C). We are therefore 
confident that, given the great flexibility of the method, conditions 
can be found to discriminate any single point mutation. When 
such mutations are present at low frequency it is interesting to 
speculate that the PNA clamp may instead be directed against 
the large excess of non-mutated genes the presence of which leads 
to unwanted background in diagnostic procedures. We now intend 
to apply the PCR clamp technique to the analysis of point 
mutations in human, animal and plant genetic material. 
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A High-Resolution Microsatellite Map 
of the Mouse Genome 



■ " " i 

^uropean Collaborative Interspecific Backcross (EUCIBJ resource was constructed for the purposes of 
^solution genenc mappmg of the mouse genome (Breen er al. 1994). The large Mus ££l<mil6 



^Ponnng 3368 ^satellites. The microsatellires are distributed among 2302 
■§gth \A6 markers per bin on average. Average bin separation is 0.61 cM. This high-reLluton inetfc map 
ajd the construction of a robust physical map of the mouse genome. P 

and provide a new cache of gene sequences that can 
be related to loci on the human genome by the con- 
served linkage groups* identified between the two 
species (Copeland et al. 1993; Andersson et aL 
1996). Nevertheless, there are no complete physical 
maps yet available for any mouse chromosome. The 
development of a high-resolution genetic map can 
enhance the production of a robust physical map on 
any mouse chromosome. 

The ability to undertake large genetic crosses 
between defined mouse strains means the construc- 
tion of high-resolution genetic maps can be readily 
achieved. Most notably, large interspecific or inter- 
subspecific backcrosses between laboratory strains 
of mice and wild species such as Mas spretus or Mas 
castamits (Avner et al. 1988) has transformed mouse 
genetic mapping (for review, see Copeland et al. 
1993). Large numbers of backcross progeny can be 
readily derived from such crosses providing the req- 
uisite numbers of meioses to achieve high genetic 
resolution. Additionally, the use of crosses between 
relatively diverged species contributes to the large 
numbers of markers that are variant between the 
parental strains. Approximately 90% of microsatel- 
lites show size variation between laboratory strains 
and the wild species, M. spretus and M. castaneus 
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grriouse is a pivotal model organism for the ge- 
~1J>rogram and with its battery of mutagenic, 
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gemc, and developmental biology approaches 
P lav , a key role as mammalian genetics 
JK r ° m S enom »« to studies of gene function 
i|2P d et al. 1993; Dietrich et al. 1995; Brown 
H,?" 1996 >- The development of genetic and 
cal rhaps in the mouse is an important step 
■^providing the genome resources for future 
:|unction studies (Dietrich et al. 1995) The 
"ruction of a high density map of mouse 
P^e-sequence polymorphisms at intermediate 
mn is complete (Dietrich et al. 1996). The 
l£ evel °P ment of genome-wide physical maps 

D rS etaL 1996) Wil1 assist S ene ma PPing as well 
MWdmg the clone resources for gene identifica- 
addition, the map will provide the sub- 
%Z P re P a ring sequence-ready maps for com- 
at "-e sequencing, which will itself speed the pro- 
rmf-fu ldentificati on. Physical maps will also 
,|pm the development of comprehensive, hieh- 

Sd^r 6 ^ 3 (SchuIer et aL 1996 > that "n 
^ ? tot the characterization of mouse mutations 

K s : • ,>|, own(S>har.mrt.ac.uk; FAX 0123S 824542. 
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(Dietrich et al. 1992). Small intersubspecific crosses 
have been used for the construction of genome- 
w.de gene or microsatellite maps at InteSnedtate 
resolute (Di etrich et al. 1996; see above™ Ur« 

a X C 1 ? Cr ° SSeS ° f 3 1000 WW or mofe 
carrving a specific mutation of interest have been 
used wuiely for high-resolution genetic mapping of 
he mutant locus as a route to positional cloninf o 
he gene (Brown 1994, 1996). However, to date 
there has been no systematic attempt to use the 
h.gh resolution afforded by large interspecific back- 
crosses to construct genome-wide higLesoIution 

Recently, we reported the construction of a 
high-resolution mouse mapping resource consisting 
of an interspecific backcross of nearly 1000 progeny 
(Breen et a I 1994). A backcross of this size has a 
genetic resolution of 0.3 cM at the 95% confidence 
level .that equates to -600 kb in the mouse genome 
We have now used this backcross to construcT a 
high-resolution arid high-density microsatell te 
map of the mouse genome. This high-resolution ge 
nefc map wi„ be the anchor for the construction^ 
a h,gh mtegrity physical map of the mouse genome 



RESULTS 

Identification of Panels of Recombinants from EUCIB 
for High-Resolution Mapping 

We have described the construction of a large inter 
specific backcross between C57BL/6 and L % i 

^55" C ° ,laborative Interspecific Back- 
cross (EUCIB)-com P r.s.ng 982 backcross progeny 
Backcross progeny were initially typed for 7 fori ' 
mary anchor ,oci spanning the entire genome J^th 
3-6 anchors per chromosome (Breen et al 1994) 
Subsequently, a number of additional anchor mark 
ers were added. Where it became apparent from ut" 
her mapping studies that the proximal and dista] 
anchors available on a particular chromosome did 
not represent either the most centromeric or telo 
menc markers, additional anchors were added or" 
he mappmg of markers close to the centromere and 
telomere The anchor map identifies the great 2 
pmy of backcross progeny mice recombinant in 
any interanchor interval (excluding only those n e 

"ants U and that T d ° Ub ' e ^ 
nants) and provides panels of mice for hi-h 

resolution mapping in each interanchor chromo" 
spme repon. In total, 93 primary anchors were as 
«gned (see Table 1,. Subsequently, a large numb er ■ 
of secondary anchors (principally miaosa tett e 
markers, were added to the map reducin, the sT« ot 

r >*l JCt.NOME RESr.ARCH 



recombinant panels still further and allowing for 
rapid high-resolution mapping of markers in any 
chromosome region. Final interanchor intervals 
comprised panels of -36 recombinant mice on av- 
erage and thus corresponded to a genetic interval of 

Mapping Microsatellites at High-Resolution 
on the EUCIB Backcross 



To develop the high resolution genetic map, a large 
number of microsatellites markers from the White- 
head/MIT map (Dietrich et al. 1996) were analyzed 
thro ugh the EUCIB backcross. The bulk of microsat- 
ellite mapping used a novel, high-throughput and 
semiautomated fluorescent dUTP genotyping ap- 
proach (Rhodes et al. 1997)-2278 of the total of 
3368 were added to the map by this approach The 
remainder were mapped either by use of standard 
agarose gel electrophoresis or alternatively with an 

a T ^"" d * emi . Ium,nescence a PP roach ( v *nal et 
al 1993). Following the determination of the paren- 
tal allele sizes (C57BL/6 and M. spretus), the appro- 
priate recombinant panel of mice was genotyped 
Given the limited resolution afforded by previous 
maps, it was not always apparent which interanchor 
interval a microsatellite would lie within and there- ■* 
fore, which recombinant panel should be typed " 
Under these circumstances, appropriate adjacent 
panels were typed. 

Although we tested all the Whitehead/MIT 
primers available during the period of map con- ' 
struction, inevitably a proportion failed to amplify 
product or proved problematic for reliable scoring 8 
and mapping. For the 4450 microsatellite markers 
tested by the semiautomated fluorescent dUTP 
genotyping approach, 51.2% amplified and were 
mapped successfully. Of the 48.8% that failed to be 
added to the map. 14.7% failed to give any product 
whatsoever (on either C57BL/6 or M. spretus DNA) 
and 31.1% produced some product but was not 
scoreable (e.g., multiple bands or variable product 
sizes). A small percentage, 3.0%, gave reliable, but 
idenncal, products between C57BL/6 and M. spretus 
DNA and were therefore not mappable. 

Given the high throughput requirements of the ' 
project, we did not return to failed primer sets to ' 
optimize PCR conditions, and as a consequence, a 
proportion of microsatellites were not added to the 
map on each chromosome. Ultimately of the 
Whitehead/MIT microsatellites available to us dur- 
ing map construction, 56% were added to the EU- ■ 
CIB map. For individual chromosomes, the propor- 
tion of Whitehead/MIT microsatellites placed on 
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Table 1. Summary of Markers and Map Statistics by Chromosome 
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iu L- l j K y °" UIUIJ d " u "Hcrosaieiiues mapped to each chromosome is indicated. Percent MIT indicates the prooortion 
of Wh.tehead/MIT microsateilites available to us at the time of mapping that were added to the EUCIB genetic map. In addition, the 
numbers of marker bins for each chromosome is given along with the genetic map length (the distance between the most proximal 
and distal anchors mapped-see Methods). 



the EUCIB map varied from -45% to 70% of White- 
head/MIT microsatellite markers (see Table 1). 

The MBx Database— Construction of the EUCIB 
High-Resolution Microsatellite Map 

The MBx database that supports the EUCIB program 
has been described previously (Breen et al. 1994). 
Genotypes were entered into the MBx database and 
genetic maps were constructed. Determining locus 
order rather than genetic distance was the primary 
consideration for the construction of genetic maps 
because this provides the most important enhance- 
ment to future physical maps that will be under- 
pinned by the high-resolution genetic map. The or- 
der of microsatellite markers along each chromo- 
some was determined by a haplotype analysis that 
minimizes the recombinants in any chromosomal 
region. The genetic distances displayed in the EU- 
CIB Genetic Map and MultiMaps are calculated so 
that the marker order derived by haplotype analysis 



is maintained (see Methods); these displays (see 
Fig. 1) are available on the World Wide Web 
site:(URL:http://www. hgmp.mrc.ac.uk/MBx/ 
MBxHomepage.html), which also includes geno- 
type data for individual markers on the maps. Direct 
access to the MBx database to view haplotypes is 
also available (see Methods). The MBx database pro- 
vides scrollable tables of haplotypes for each chro- 
mosome, identifying and highlighting all recombi- 
nation events. It is possible to select and display all 
mice containing recombination events in a particu- 
lar interval to assess the raw data and to evaluate 
how robust locus order is. 

The EUCIB High-Resolution Genetic Map 

In total, 3368 microsatellites and anchors have been 
mapped and ordered at high resolution on the EU- 
CIB backcross (see Table 1). The total length of the 
high resolution EUCIB genetic map is 1398 cM as 
calculated from proximal-distal anchor distances 
on each chromosome (Table 1). On each chromo- 
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note when comparing the EUCIB and. Whitehead/ 
MIT maps that for the Whitehead/MIT map (Diet- 
rich et al. 1994) constructed from a limited number 
of F 2 intercross progeny, statistical support for order 
of a given marker could vary, either because of in- 
complete genotyping or because the marker is 
dominant rather than codominant. Thus multimap 
comparisons of Whitehead/MIT and EUCIB maps 
illustrate, not unsurprisingly, that markers lying in 
adjacent bins on the Whitehead/MIT map are some- 
times found to interdigitate when their order is de- 
termined at high resolution on the EUCIB map (see 
Fig. 1). For this reason, we have chosen a relatively 
high cutoff point to assess the frequency of markers 
that deviate in position between the EUCIB and 
Whitehead/MIT maps. Proceeding systematically 
proximal to distal on each chromosome, we co- 
aligned each EUCIB primary or secondary anchor 
marker with the corresponding locus on the White- 
head/MIT map, and then assessed each microsatel- 
lite lying in the following inter-anchor segment on 
the EUCIB map for significant deviations with the 
Whitehead/MIT map. (Only markers lying between 
the most proximal and distal anchors on each chro- 
mosome were subject to this analysis to avoid biases 
from the few poorly mapped markers lying outside 
these anchor loci.) Choosing a cutoff point of 10 
cM, only 76 markers (2.3%) on the EUCIB map 
mapped 10 cM or more from their expected location 
on the basis of the Whitehead/MIT map. At 15 and 
20 cM, this figure dropped to 1.1% and 0.8%, re- 
spectively. Overall, there is excellent agreement be- 
tween Whitehead/MIT and EUCIB maps. 



Distribution of Markers and Recombination Events 

Only two large bins— one of 1 1 and one of 12 mark- 
ers (see Table 2)— remain on the final EUCIB map. 
We examined these bins to see if they corresponded 
to any of the large bins on the Whitehead/MIT map 
that may be accounted for by regions of crossover 
suppression that are common between the two 
crosses used for mapping (the Whitehead/MIT map 
was constructed by use of a (CS7BLf6J-ob/ob x Mus 
castaneits) F2 intercross (see Dietrich et al. 1996). 
The bin of 11 markers is found on chromosome 16 
at position 6.34 cM and includes the microsatellites 
D16MU8, 32, 33, 79, 121, 122, 131, 142, 161, ISO, 
and 181. The bin of 12 markers is found on chro- 
mosome 2 at position 32.23 cM and includes the 
microsatellites D2Mit8, 72, 89, 155, 157, 240, 241, 
298, 323, 373, 433, and 471. However, neither of 
these bins corresponded to large, unseparated bins 
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of markers on the Whitehead/MIT map (Dietrich et 
al. 1996). 

We have also examined the EUCIB maps to de- 
termine the largest genetic gaps. Only five gaps >5 
cM were identified between the most proximal and 
distal anchors. On chromosome 1, a gap of 8.59 cM 
separates D1MH65 and DlMitll8. On chromosome 
5, a gap of 7.38 cM was found to separate D5Mitl60 
and D5Nds6, whereas on chromosome 9 two gaps 
between D9MU217 and D9MU58 (6.60 cM) and 
D9MU294 and D9MM2 (5.44 cM) were identified. 
On chromosome 18, a gap of 9.68 cM was identified 
between D18MU33 and D18Mit8. None of these 
gaps corresponded to any of the larger genetic in- 
tervals on the Whitehead/MIT map (Dietrich et al. 
1996; see Discussion). 

In total, 17,029 distinct recombination events 
were observed in the MBx database among the 982 
progeny and distributed across all 20 chromosomes 
(see Table 3). On average, each mouse carries 17.3 
recombination events. Forty-four percent of chro- 
mosomes did not show an observable recombina- 
tion event. Forty percent of chromosomes (7863) 
showed a single recombinant, 7.2% of chromo- 
somes demonstrated double recombinants, and 
5.6% triple recombinants. We have used a good- 
ness-of-fit test to analyze the distribution of recom- 
binant classes on each chromosome for fit to Pois- 
son. None of the 20 chromosomes fit a Poisson dis- 
tribution, differing significantly in all cases (see 
Table 3). As observed for the original EUCIB anchor 
map (Breen et al. 1994), and in agreement with 
other reports (Ceci et al. 1989; Saunders and Seldin' 
1990; Nadeau et al. 1991; Reeves et al. 1991, 1997), 
there is a general over-representation of single re- 
combinants, and double recombinants are under- 
represented on all chromosomes — probably because 
of crossover suppression. In general, triple recombi- 
nant classes and classes carrying larger numbers of 
recombinants are over-represented — probably 
largely because of genotyping errors (see below). In 
total, 10,991 chromosomes (56%) carried one or 
more crossovers. In general, there was a broad rela- 
tionship between a chromosome's genetic length 
and the total number of recombinant events ob- 
served for that chromosome. Overall, however, the 
relationship between numbers of recombinants per 
chromosome and genetic length across all 20 chro- 
mosomes was not significant (see legend to Table 3). 
A binomial test to identify those chromosomes that 
contribute significantly fewer or greater recombi- 
nants than expected indicates that chromosomes 1, 
7, 8, and 18 contribute significantly more recombi- 
nants than expected, whereas chromosomes 3, 11, 
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15, 16, and 17 contribute significantly fewer (see 
legend to Table 3). 



Error Correction 

Multiple recombinants observed on an individual 
chromosome (see above) can be characteristic of 
genotyping errors. For this reason, we have sought 
to identify those multiple recombinants that may 
be most indicative of genotyping errrors to reach 
some assessment of the overall level of genotyping 
error in the EUCIB dataset. Error correction on each 
chromosome has been proceeded by the identifica 
tion of haplotypes 1(.)2(.)1 and 2(.)l(.)2 for adjacent 
markers within the database where 1 represents a 
homozygote, 2 a heterozygote and (.) represents a 
variable number of intervening markers for which 
genotype information might not be available in any 
haplotype (see legend to Table 4). These apparently 
double recombinant haplotypes for very closely 
linked markers would be expected to occur rarely if 
at all, and contribute to a proportion of double and 
triple recombinant chromosomes and to chromo- 
somes carrying larger numbers of recombinants (see 
above). Subsequently, having identified these 
double recombinant haplotypes, primary data entry 
is checked, or in some cases scorings are repeated 
Eventually, when error correction is complete reor- 
dering of markers is carried out by MBx Neverthe 
less, following error correction and reordering 
some aberrant haplotypes remain (see Table 4 for 
the total number of spurious double recombinant 
haplotypes per chromosome). In total, 1158 double 
recombinants of the form 1(.)2(.)1 or 2()1()2 re 
mam. Across the whole dataset, 41% of these double 
recombinants are of the form 121 or 212 with no 
intervening unscored markers. Sixty-six percent of 
double recombinants are of the form 121 212 and 
1(.)2(.)1 or 2(.)1(.)2 in which only a single'interven- 
ing marker (.) is unscored. The high frequency of 
these apparently closely spaced double recombi- 
nants is very suggestive of genotyping errors rather 
man true recombination events. 

In general, if we assume that all aberrant hap- 
lotype events represent genotyping errors, then the 
average genotyping error rate across the whole ge- 
nome is -0.01. The rate varies from chromosome to 
chromosome, therefore, on some chromosomes 
for example, chromosome 16, it is as low as 0 002* 
However, we have undertaken additional analyses 
to empirically estimate the residual error rate for 
genotypes within the EUCIB dataset. Additionally 
we have attempted to estimate the overall error rate 
for marker order across the genome. 



Table 4. Number and Chromosome* 
Distribution of Aberrant Double and 
Triple Recombinants * 



Chromosome 



Double 
recombinants 
1(-)2(.)1 
2(.)1()2 



1 
2 
3. 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
X 

Totals 



83 
47 
44 

111 
65 
86 

120 

106 
63 
73 
16 
52 
46 
57 
15 
8 

28 

28 

46 

64 
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Double recombinants of the form 1(.)2(.)1 anS^fl 
homozygote; 2, heterozygote) that may representee 
ing errors (see text) were identified for eacHf^^ 
from the MBx database. (.) represents a variabW* - ^ 
intervening markers with no genotypes for a hipW 
10 intervening markers for which there was,^^^ 
formation were permitted. Thus, at the limi^o^ 
binants of genotype 1 (10 markers, no geno^ 
markers, no genotypes) 1, and 2 (10 marker^"r^p 
1(10 markers, no genotypes) 2 were identifieS?^ 
somes 1 and 5, a further round of error checkmf^._. 
recombinant genotypes and reordering took ^^^^ 
chromosomes to assess the underlying erfor^r^^^^ 
Triple recombinants of the form 1 ^.(JJMM 
22(.)1(.)2(.)11, which may represent locaT^miso^ 
text), were also identified from MBx. Up 16^1)^^ 
markers (.) for which there is no genotype infpn^^M 
again permitted. -'$0 



Genotyping Error Race 



Following the final rounds of data Pfd$j|gjfe 



checking, and ordering, we chose"^^^^ 
somes— 1 and 5— and identified f^^^^^ 
1(.)2('.)1 and 2(.)1(.)2 haplotypes fj^^B^p 
ers from within the dataset. Some^^^i 



2{.)1(.}2 haplotypes will occur as^t]| 
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complex triple recombinant haplotypes [e.g., 
2 2(.)K«)2 (.)11] among adjacent markers. These 
triple recombinant haplotypes can arise because of 
errors in typing, or more likely, errors in ordering in 
which the inversion of the central two markers 
would remove the triple recombinant haplotype 
and substitute a single recombinant haplotype in its 
place (see below). Nevertheless, for chromosomes 1 
and 5, the aberrant genotypes were rescored in all 
cases. This involved repeating the appropriate PCR 
reactions under identical reaction conditions. Fol- 
lowing retyping, these chromosomes were reor- 
dered. Table 5 gives the reduction in the number of 
1(.)2(.)1 and 2(.)1(.)2 haplotypes observed on each 
chromosome following second rounds of error 
checking and ordering and, therefore,, a more accu- 
rate figure of the genotyping error rate. Taking both 
chromosomes together, of the original aberrant 
haplotype genotypings, 48% (134) were found to be 
incorrect, giving an error rate in genotyping for 
these two chromosomes of 0.008 that is, in general, 
in agreement with the genome-wide figure quoted 
above. Extrapolating to the whole genome, if -50% 
of the observed aberrant double recombinants rep- 
resent genotyping errors, then the overall error rate 
is -0.005 or 1 in 200 genotypes in the database. 
What is notable is that a significant number of 
1(.)2(.)1 and 2(.)1(.)2 haplotypes remain on each 
chromosome despite this second round of error 
checking. 

By and large, these aberrant haplotypes do not 
result from misorders because the level of triple re- 
combinants is very low (see below). We have also 
considered the possibility that some or all of these 
aberrant double recombinants arise because of re- 
sidual heterozygosity within the M. spretus mice 
used in establishing the backcross. That part of the 
EUCIB backcross performed in London used M. spre- 



tus animals from a colony that had not been sys- 
tematically inbred, whereas M. sprehts animals used 
in Paris were from the SEG/Pas colony that is mod- 
erately inbred after 20 generations of unrelaxed 
brother-sister matings (Breen et al. 1994). Apparent 
double recombinant chromosomes of the form 
SSBSS (where S and B are the M. spretiis and BL/6 
alleles, respectively) could arise if there is residual 
heterozygosity in the M. spretus parents with a B 
rather than an S allele present at the supposed 
double recombinant locus in some members of the 
parent M. spretiis population. In the backcross to 
BL/6, an SSBSS haplotype inherited from the Fl 
would be scored as a 2(.)1(.)2 haplotype in MBx — 
which we have designated a Bl haplotype. Con- 
versely, in the backcross to M. spretus, an SSBSS hap- 
lotype would be scored as 1(.)2(.)1 in MBx, desig- 
nated a S2 haplotype. Both Bl and S2 classes might 
be expected to be in excess if residual heterozygosity 
was a significant factor. However, residual heterozy- 
gosity from the M. spretus parent population could 
not account for haplotypes of the form BBSBB. In 
the backcross to BL/6, an BBSBB haplotype inherited 
from the Fl would be scored as a 1(.)2(.)1 haplotype 
in MBx— which we have designated a B2 haplotype. 
Conversely, in the backcross to M. spretus, an BBSBB 
haplotype would be scored as 2(.)1(.)2 in MBx— 
designated a SI haplotype. Overall, we find that 
there are in total 606 double recombinant haplo- 
types in the Bl + S2 class, whereas there are 552 
haplotypes in the bl -h SI class. Residual heterozy- 
gosity does not, therefore, appear to be a major fac- 
tor in the appearance of aberrant double recombi- 
nants. 

Marker Order Error Rate 

Following retyping and reordering, we also identi- 
fied on every chromosome all triple recombinant 
haplotypes that remained— 11(.)2(.)1(.)22 and 
22(.)1(.)2(.)11— and that potentially represent local 
misorders. The numbers of triple recombinant hap- 
lotypes remaining on each chromosome are also 
given in Table 4. Some chromosomes had no detect- 
able triple recombinants and in total, across the ge- 
nome, we found 160 haplotypes representing the 
likely total number of locally misordered markers. 

DISCUSSION 

We have constructed the first high-resolution ge- 
netic map for a mammalian species. The EUCIB 
high-resolution microsatellite map has allowed us 
to order markers to 2302 bins providing a bin sepa- 



Table 5. Assessing Genotype Error Rates 



in EUCIB 





Double recombinants 


Chromosome 


1(.)2(.)1/2(.)K.)2 


1 


125 a 86 b 83 c 


5 - 


157 a 84 b 65° 



Reduction in the numbers of double recombinants following 
further rounds of regenotyping and reordering on chromo- 
somes 1 and 5 are given. 

'After data production, error checking, and ordering. 
Remaining double recombinants following regenotyping. 
Remaining double recombinants following reorder. 
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teranchor genetic distances for the primary anchors are then 
calculated for each chromosome and the remaining primary 
anchors assigned to the genetic map. Secondary anchors and 
microsatellites are subsequently incorporated into the genetic 
map maintaining genetic order as derived from the haplotype 
analysis. For each primary anchor interval, the total cumula- 
tive number of recombinants separating anchor and micro- 
satellite markers in that interval is derived. The genetic dis- 
tance separating a marker from any other marker or anchor in 
each primary anchor interval can then be calculated from the 
primary anchor genetic distance on the basis of the following 
ratio: 

No. of recombinants separating marker from adjacent marker 
or anchor/Total cumulative no. of recombinants separating 
anchors and markers in the interval 

Microsatellites mapping beyond the most proximal or distal 
primary anchors according to the haplotype analysis are 
added to the genetic map separately. However, it is likely that 
only a fraction of the relevant recombinants separating these 
markers and the primary anchor have been tested and genetic 
distances determined are, therefore, inaccurate. 



Genotyping 

Recombinant panels in any chromosome region were geno- 
typed by use of a high-throughput, semiautomated fluores- 
cent dUTP genotyping approach that has been described re- 
cently (Rhodes et al. 1997). 
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ABSTRACT The analysis of DNA for the presence of 
particular mutations or polymorphisms can be readily accom- 
plished by differential hybridization with sequence-specific 
oligonucleotide probes. The in vitro DNA amplification tech- 
nique, the polymerase chain reaction (PCR), has facilitated the 
use of these probes by greatly increasing the number of copies 
of target DNA in the sample prior to hybridization. In a 
conventional assay with immobilized PCR product and labeled 
oligonucleotide probes, each probe requires a separate hybrid- 
ization. Here we describe a method by which one can simul- 
taneously screen a sample for all known allelic variants at an 
amplified locus. In this format, the oligonucleotides are given 
homopolymer tails with terminal deoxyribonucleotidyltrans- 
ferase, spotted onto a nylon membrane, and covalently bound 
by UV irradiation. Due to their long length, the tails are 
preferentially bound to the nylon, leaving the oligonucleotide 
probe free to hybridize. The target segment of the DNA sample 
to be tested is PCR-amplifiea with biotinylated primers and 
then hybridized to the membrane containing the immobilized 
oligonucleotides under stringent conditions. Hybridization is 
detected nonradioactive^ by binding of streptavidin-horserad- 
ish peroxidase to the biotinylated DNA. followed by a simple 
colorimetric reaction. This technique has been applied to HLA- 
DQA genotyping (six types) and to the detection of Mediterra- 
nean ^-thalassemia mutations (nine alleles). 

Differential hybridization with sequence-specific oligonucle- 
otide probes has become a widely used technique for the 
detection of genetic mutations and polymorphisms (1-5). 
When hybridized under the appropriate conditions, these 
synthetic DNA probes (usually 15-20 bases in length) will 
anneal to their complementary target sequences in the sample 
DNA only if they are perfectly matched. In most cases, the 
destabilizing effect of a single base-pair mismatch is sufficient 
to prevent the formation of a stable probe-target duplex (6). 
With an appropriate selection of oligonucleotide probes, the 
relevant genetic content of a DNA sample can be completely 
described. 

This very powerful method of DNA analysis has been 
greatly simplified by the in vitro DNA-amplification tech- 
nique, the polymerase chain reaction (PCR) (7-9). The PCR 
can selectively increase the number of copies of a particular 
DNA segment in a sample by many orders of magnitude. As 
a result of this 10 6 - to KF-foId amplification, more convenient 
assays and nonradioactive detection methods have become 
possible (10-12). These PCR-based assays are usually done 
by amplifying the target segment in the sample to be tested, 
fixing the amplified DNA onto a series of nylon membranes, 
and hybridizing each membrane with one of the labeled 
oligonucleotide probes under stringent hybridization condi- 
tions. However, each probe must still be individually hybrid- 

The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. $1734 solely to indicate this fact. 



ized to the amplified DNA and the process can easily become 
difficult in a system where many different mutations or „ 
polymorphisms occur. 

One approach to address this procedural difficulty is to 
"reverse" the DNAs: attach the oligonucleotides to the 
nylon support and hybridize the amplified sample to the 
membrane. Thus, in a single hybridization reaction, an entire 
series of sequences could be analyzed simultaneously. The 
strategy we adopted was to immobilize the oligonucleotides 
onto nylon filters by ultraviolet fixation. Exposure to UV 
light activates thymine bases in DNA. which then covalently 
couple to the primary amines present in nylon (13). It seemed 
unlikely, however, that short oligonucleotides could r .: di- 
rectly attached to nylon in this manner and still retain iieir 
ability to discriminate at the level of a single base-pair 
mismatch. Consequently, the addition of a long deoxyribo- 
thymidine homopolymer tail, poly(dT), to the 3' end of the 
oligonucleotide appeared promising for several reasons. 
First, the poIy(dT) tail would be a larger target for UV 
crosslinking and should preferentially react with the nylon. 
Second, dTTP is very readily incorporated onto the 3' ends 
of oligonucleotides by terminal deoxyribonucieotidyltrans- 
ferase and would permit the synthesis of very long tails .14). 
(Deoxyribothymidine would also be the most efficiently 
incorporated base if a purely synthetic route were chosen.) 
Third. Collins and Hunsaker (15) had shown that the pres- 
ence of a poiy(dA) homopolymer tail, used to introduce 
multiple 35 S labels, did not affect the function of sequence- 
specific oligonucleotide probes. 

We have used this technique to attach oligonucleotide 
probes specific for the six major HLA-DQA DNA types (16) 
and the eight most common Mediterranean ^-thalassemia 
mutations (4) to nylon filters. The target segment of the DNA 
sample to be tested (either HLA-DQA or 0-globin) *as 
amplified by PCR with biotin-Iabeled primers to introduce a 
nonradioactive tag. Hybridization of the amplified product to 
the immobilized oligonucleotides and binding of streptavidin- 
horseradish peroxidase conjugate to the biotinylated primers 
were performed simultaneously. Detection was accom- 
plished by a simple colorimetric reaction involving the en- 
zymatic oxidation of a colorless chromogen that yielded a red 
color wherever hybridization occurred. 

MATERIALS AND METHODS 

Tailing of Oligonucleotides. Oligonucleotides were synthe- 
sized on a DNA synthesizer (model 8700, Biosearch) with 
^-cyanoethyl N,Af-diisopropyIphosphoramidite nucleosides 
(American Bionetics, Hayward, CA) by using protocols 
provided by the manufacturer. Oligonucleotide (200 pmol) 
was tailed in 100 /d of 100 mM potassium cacodyIate/25 mM 
Tris-HCl/1 mM CoCl 2 /0.2 mM dithiothreitol, pH 7.6 (17), 
with 5-160 nmol deoxyribonucleoside triphosphate (dTTP or 

Abbreviation: PCR, polymerase chain reaction. 
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dCTP) and 60 units (50 pmol) of terminal deoxyribonucleoti- 
dyltransferase (Ratliff Biochemicals, Los Alamos, NM) for 
60 min at 37°C. Reactions were stopped by addition of 100 /xl 
of 10 mM EDTA. The lengths of the homopolymer tails were 
controlled by limiting dTTP or dCTP. For example , a nominal 
tail length of 400 dT residues was obtained by using 80 nmol 
of dTTP in the above reaction. 

Preparation of Filters. The tailed oligonucleotides were 
diluted into 100 /xl of TE (10 mM Tris-HCl/0.1 mM EDTA, 
pH 8.0) and applied to a nylon membrane (Genetrans-45; 
Plasco, Woburn, MA) with a spotting manifold (BioDot; 
BioRad). The damp Filters were then placed on TE-soaked 
paper pads in a UV light box (Stratalinker 1800; Stratagene) 
and irradiated at 254 nm. Dosage was controlled by the 
device's internal metering unit. The irradiated membranes 
were washed in 200 ml of 5 x SSPE fix SSPE is 180 mM 
NaCI/10 mM NaH 2 P0 4 /l mM EDTA, pH 7.2) with 0.5% 
NaDodS0 4 for 30 min at 55°C to remove unbound oligonu- 
cleotides. If not used immediately, the filters were rinsed in 
water, air-dried, and stored at room temperature until 
needed. 

Amplification of DNA. PCR amplification of genomic se- 
quences was performed by a slight modification of previously 
described procedures (9). DNA (0.1-0.5 /ig) was amplified in 
100 Mi containing 50 mM KG, 10 mM Tris-HG (pH 8.4), 1.5 
nM MgCl 2 , 10 /xg of gelatin, 200 mM each dATP, dCTP, 
dGTP, and dTTP, 0.2 /xM each biotinylated amplification 
primer, and 2.5 units of Thermus aquatic us (Taq) DNA 
polymerase (Perkin-EImer/Cetus). The cycling reaction was 
done in a programmable heat block (DNA Thermal Cycler; 
Perkin-EImer/Cetus) set to heat at 95°C for 15 sec (denature), 
cool at 55°C for 15 sec (anneal), and incubate at 72°C for 30 
sec (extend) by the "Step-Cycle" program. After 30 repeti- 
tions, the samples were incubated an additional 5 min at 72°C. 
The primers contained a single molecule of biotin attached to 
•^e 5' end of the oligonucleotides (described below). 

Hybridization and Detection of Amplified DNA. Each filter 
vith bound oligonucleotides was placed in 4 ml of hybrid- 
ization solution containing 5x SSPE, 0.5% NaDodS0 4 , and 
400 ng of streptavidin-horseradish peroxidase conjugate 
(SeeQuence; Eastman Kodak). PCR-ampiified DNA (20 fi\) 
was denatured by addition of an equal volume of 400 mM 
NaOH/10 mM EDTA and added immediately to the hybrid- 
ization solution, which was then incubated at 55°C for 30 min. 
(During this incubation, hybridization of PCR product to 
immobilized oligonucleotide and binding of streptavidin- 
horseradish peroxidase to biotin present in the PCR product 
-ur simultaneously.) The filters were briefly rinsed twice in 
„x SSPE/0.1% NaDodS0 4 at room temperature, washed 
once in 2x SSPE/0.1% NaDodS0 4 at 55°C for 10 min, and 
then briefly rinsed twice in 2x PBS (lx PBS is 137 mM 
NaG/2.7 mM KCi/8 mM Na 2 HP0 4 /L5 mM KH 2 P0 4 , pH 
7.4) at room temperature. Color development was performed 
by incubating the filters in 25-50 ml of red leuco dye 
(Eastman Kodak) at room temperature for 5-10 min. Photo- 
graphs were taken for permanent records. 

Synthesis of Biotinylated Oligonucleotide Primers. Primary 
amino groups were introduced at the 5' termini of the primers 
a variation of published procedures (18, 19). In brief, 
:craethylene glycol was converted to the monophthalimido 
derivative by reaction with phthalimide in the presence of 
triphenylphosphine and diisopropyl azodicarboxylate (20). 
The monophthalimide was converted to the corresponding 
0-cyanoethyl diisopropylamino phosphoramidite by standard 
protocols (21). The resulting phthalimido amidite was added 
to the 5' ends of the oligonucleotides during the final cycle of 
automated DNA synthesis by using standard coupling con- 
ditions. During normal deproteqtion of the DNA (concen- 
trated aqueous ammonia for 5 hr at 55°C), the phthalimido 



quently acylated with an appropriate biotin active ester. 
NHS-LC-biotin (Pierce) was selected for its water solubility 
and lack of steric hindrance. The biotinylation was performed 
on crude, deprotected oligonucleotide, and the mixture was 
purified by a combination of gel filtration and reversed-phase 
HPLC. Additional details of this procedure will be published 
elsewhere (22). 

RESULTS 

Binding and Hybridization Efficiency of Tailed Oligonucle- 
otides. The relative efficiencies with which synthetic oligo- 
nucleotides with homopolymer tails of various lengths were 
covaiently bound to the nylon filter were measured as a 
function of UV exposure (Fig. 1 Left). Oligonucleotides with 
longer poly(dT) tails were more readily fixed to the mem- 
brane, and all attained their maximum values by 240 mJ/cm 2 
of irradiation at 254 nm. In contrast, the (dC)4oo-tailed oligo- 
nucleotide required more irradiation to crosslink to the nylon 
and was not comparable to the equivalent (dT)4oo construct 
even after 600 mJ/cm 2 exposure. This difference is consistent 
with the findings of Church and Gilbert (13) that suggested 
light-activated thymine bases bind more effectively to nylon 
than do cytosine bases. The untailed oligonucleotide was also 
retained by the membrane in a manner that roughly paralleled 
the poly(dC) product. 

Efficient binding of oligonucleotides to the membrane, 
however, does not necessarily correlate with hybridization 
efficiency, and so hybridization efficiency as a function of 
UV dosage was determined in a separate experiment (Fig. 1 
Right). These results show a distinct optimum of exposure 
that changes with the length of the poiy(dT) tail and is more 
sharply pronounced for the longer tails. Additional experi- 
ments have shown the optimal dosages to be about 20 mJ/cm 2 
for the (dT) 80 o and 40 mJ/cm : for the (dT) 40 o oligonucleotides 
(R.K.S.. unpublished observations). The peak efficiencies of 
the (dT) d oo and (dT) 80 o constructs are around 1% (45-50 fmol 
of radiolabeled probe annealed to =3.5 pmol of tailed oligo- 
nucleotide), which is similar to the value reported by Gamper 
et ai. (23) for an oligonucleotide probe hybridized to nylon- 
bound plasmid DNA. 

Comparison of the data in Fig. 1 Left and Right for 60 
mJ/cm 2 irradiation indicates that oligonucleotides with 
longer tails hybridize more effectively than can be accounted 
for by the additional amounts bound to the filter. This 
suggests a spacer effect wherein the poly(dT) tails improve 
hybridization efficiency by increasing the distance between 
the nylon membrane and the terminal oligonucleotide probe. 
Besides possible UV damage to the DNA itself, additional 
exposure causes more of the tail to become attached to the 
membrane, thus reducing the average spacer length and 
decreasing hybridization efficiency. The markedly different 
hybridization profile of the poly(dC) oligonucleotide is com- 
patible with this interpretation. Because cytosines react less 
efficiently with the filter, hybridization efficiency reaches a 
plateau where loss due to UV damage and tail shortening are 
compensated by the fixing of new molecules (see Fig. 1 Left). 
This characteristic of cytosine may make a poly(dC) tail 
desirable when UV irradiation cannot be carefully controlled. 
Under the stringent hybridization conditions used in this 
experiment, no signal was detected for the untailed oligonu- 
cleotide. 

DNA Typing at the HLA-DQA Locus. The HLA-DQA test is 
derived from a PCR-based oligonucleotide typing system that 
partitions the polymorphic variants at the DQA locus into 
four major DNA types, DQA I to DQA4, and three DQA1 
subtypes, DQAl.l to DQA 1 3 (16). Four oligonucleotides 
specific for the major DQA types, four oligonucleotides that 
characterize the DQA! subtypes, and one control oligonu- 
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1) were given 400-base poiy(dT) tails and spotted onto nylon 
filters. The sequence variation that defines the DQA types is 
localized within a relatively small "hyper-variable" region of 
the second exon (24) that can be encompassed within a single 
242-base-pair PCR amplification fragment. Biotinylated 
primers (Table 1) were used to amplify the DQA fragment 
from several genomic DNA samples: six homozygous cell 
lines and six heterozygous individuals. After hybridization of 
the amplified DNA to the membranes and color development, 
|p? D @ A genotypes of these samples were readily apparent 

Table 1. Sequences of oligonucleotide primers and probes 



Although most of the oligonucleotide probes are uniqu, 
specific for one DQA type, two of the DQA I subtyping 
probes cross-hybridize to several DNA types. GH89 hybrid- 
izes to a sequence common to the DQA 1.2. DOAIJ, and 
DQA4 types, and the probe GH76 detects all DQA types 
except DQAL3. (The latter is needed to distinguish DQAL2/ 
13 heterozygotes from DQAL3/1.3 homozygotes.) The 
length and strand specificity of the oligonucleotides were 
empirically adjusted until their relative hybridization efficien- 
cies and stringency requirements for allelic discrimination 
were approximately the same. (This was achieved by deter- 



Name* 



Function 



RS151 
RS152 
RH54 (2) 
GH75 (4) 
RH71 (4) 
GH67 (4) 
GH66 (4) 
GH88 (4) 
GH89 (4) 
GH77 (4) 
GH76 (4) 



Sequence 



DQA primer 
DQA primer 
AH DQA types 
DQA I probe 
DQA2 probe 
DQAS probe 
DQA4 probe 
DQAl.l probe 
DQAIJ, -1J, -f 
DQA IJ probe 
Not DQA I J 



b-GTGCTGCAGGTGTAAACTTGTACCAGt 
b-CACGGATCCGGTAGCAGCGGTAGAGTTG* 

CTACGTGGACCTGGAGAGGAAGGAGACTGCCTG 
CTCAGGCCACCGCCAGGCA 
TTCCACAGACTTAGATTTGAC 
TTCCGCAGATTTAGAAGAT 
TGTTTGCCTGTTCTCAGAC 
CGTAGAACTCCTCATCTCC 
GATGAGCAGTTCTACGTGG 
CTGGAGAAGAAGGAGAC 
GTCTCCTTCCTCTCCAG 



Name* 



Function 



Sequence 



RSI51 0-GIobin primer b- ATCACTT AG ACCTC A CCCTG* 

RS152 0-GIobin primer b-GACCTCCCACATTCCCTTTT* 

RS187 f8) Normal TAGACCAATAGGCAGAGAG 

RS188 (8) Mutant CTCTCTGCCTATTAGTCTA 

RS87 (4) Normal CCTTGGACCCAGAGGTTCT 

RS89 (4) Mutant fP AGAACCTCTAGGTCCAAGG 

RS189 (0.33) Normal CTTGATACCAACCTGCCCA* 

RS190 (0.33) Mutant 0 U TGGGCAGGTTGGCATCAAG 

RSI91 (1) Mutant 0 M TGGGCAGATTGGTATCAAG 

RS192 (4) Normal CCATAGACTCACCCTGAAG 

RS193 (4) Mutant CTTCAGGATGAGTCTATGG 

RS201(2) Normal^* 7 * 5 GCAGAATGGTAGCTGGATT 

RS202 (2) Mutant p 2 ™ GCAGAATGGTACCTGGATT 

RS196 (4) Normal ACTCCTGAGGAGAAGTCTG* 

RS197 (4) Mutant 0* GACTCCTGGGAGAAGTCTG 

— RS198 (4) Mutant ? TG ACTCCTGAGGAGGTCTG 

^Z^Sn^t^ to" ««* «* -ount (p mo .) of tailed oligonucleotide probe app.ied to the nylon membrane. 

*S S in e o£ 8 ' 0bin ° ,igOnUC,e0tide P robes each «P» «wo sites of potential p-tha.asse.nia mutations and are specific for normaJ sequences at both 
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Fig. 2. DNA typing at the H LA- DQA locus. Each tailed oligo- 
nucleotide probe was spotted onto 12 duplicate membranes, irradi- 
ated at 40 mJ/cm 2 , hybridized with amplified DQA sequences in 
genomic DNA samples, and treated for color development. The 
specificity of each immobilized oligonucleotide is given at the top, 
and the DQA genotype of each sample is noted at the right. The name, 
amount applied to the membrane, specificity, and sequence of each 
oligonucleotide are listed in Table 1. 

mining the optimal hybridization conditions for each member 
of an initial set of probes, then shortening or lengthening each 
oligonucleotide until they all hybridized under equivalent 
conditions.) These eight probes produce a unique hybridiza- 

.on pattern for each of the 21 possible DQA diploid combi- 

: ations. 

Detection of ^-Thalassemia Mutations. Although there are 
>54 characterized mutations of the /3-giobin gene that can 
give rise to ^-thalassemia (25), each ethnic group in which 
this disease is prevalent has a limited number of common 
mutations (4, 26, 27). In Mediterranean populations, 8 mu- 
tations are responsible for >90% of the /3- thalassemia alleles 
(4). Oligonucleotides were synthesized that are specific for 
each of these 8 mutations as well as their corresponding 
normal sequences (Table 1). The oligonucleotides were given 
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(dT)4oo tails with terminal transferase and applied to mem- 
branes. Since the ^-thalassemia mutations are distributed 
throughout the /3-globin gene, biotinylated PCR primers that 
amplify the entire gene in a single 1780-base-pair fragment 
were used. (This amplification product encompasses all 
known ^-thalassemia mutations, not only the predominant 
Mediterranean mutations examined here.) After hybridiza- 
tion and color development, the 0-giobin genotypes could be 
determined by noting the pattern of hybridization (Fig. 3). 

Unlike the DQA typing system, two oligonucleotide probes 
are needed to analyze each mutation— one specific for the 
normal sequence and one specific for the mutant sequence — 
in order to differentiate normal/mutant heterozygous carriers 
from mutant/mutant homozygotes. A complicating factor in 
this analysis is caused by apparent secondary structure in 
various portions of the relatively long /3-globin amplification 
product that interferes with oligonucleotide hybridization. 
The relatively high stringency needed to minimize this sec- 
ondary structure requires the use of longer {e.g., 19-base) 
oligonucleotide probes. Because this constraint would not 
permit varying the length of the oligonucleotides to compen- 
sate for different hybridization efficiencies, the "balancing" 
of signal intensities was accomplished by adjusting the 
amount of each oligonucleotide applied to the membrane. 
This was done by applying various amounts of each oligo- 
nucleotide onto a membrane and then, after hybridization and 
color development, simply selecting the positive spots that 
had similar intensity. 

DISCUSSION 

These studies have demonstrated the feasibility of immobi- 
lizing sequence-specific probes onto nylon membranes and 
hybridizing PCR-amplified, biotin-labeled genomic frag- 
ments to the filters to determine the genetic content of the 
DNA sample. We have applied this method to HLA-DQA 
genotyping and to the detection of ^-thalassemia mutations. 
Although the number of probes used in the two tests were 
modest (9 for DQA and 14 for /3-thaIassemia), expanding the 
analyses to include even more oligonucleotides should not be 
difficult. 

The recently described technique of simultaneous ampli- 
fication of several DNA fragments, "'multiplex" PCR (28), 
should readily permit the concurrent analysis of multiple 
genetic loci. Using the immobilized-probe format, we have 
been able to simultaneously amplify and type at three loci: the 
HindlU polymorphism in the G -y-globin gene (29), the Ava II 
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Fig. 3. Detection of ^-thalassemia 
mutations. Various amounts of each 
tailed oligonucleotide probe were applied 
to 12 duplicate nylon filters, irradiated at 
40 mJ/cm 2 , hybridized with amplified 
j5-globin sequences in genomic DNA 
samples, and treated for color develop- 
ment. The ^-thalassemia locus that is 
detected by each immobilized oligonucle- 
otide pair is given at the top of the filters. 
For each filter, the upper row contains 
the oligonucleotide probes that are spe- 
cific for the normal sequence and the 
lower row contains the oligonucleotides 
specific for the mutant sequences. The 
/J-globin genotype of each sample is 
noted at the right. The name, amount 
applied to the membrane, specificity, and 
sequence of each oligonucleotide are 
listed in Table 1. IVS, intervening se- 
ntience fintrnnV 
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polymorphism in the low density lipoprotein receptor gene 
(30), and the HLA-DQA gene (R.K.S., unpublished obser- 
vations). Other genetic targets whose analysis would be 
simplified by this technique include the detection of somatic 
mutations in the RAS genes, where 6 loci and 66 possible 
alleles occur (31), some of the HLA class II 0-chain genes, 
where as many as 25 alleles can be detected (T. Bugawan, S. 
Scharf, and H.A.E., unpublished observations), and /3- 
thalassemia in Middle Eastern populations, where in addition 
to the endogenous mutations, Mediterranean and Asian In- 
dian mutations are present at significant frequencies (H. 
Kazazian, personal communication). This format should also 
prove useful for the detection of infectious pathogens or for 
environmental surveys of microorganisms by immobilizing a 
panel of species-specific probes. 

The ability to label probes and detect their hybridization 
without radioactivity is a convenient feature of PCR-based 
DNA tests and, perhaps more importantly, makes this type 
of analysis feasible in areas where radioactive labeling re- 
agents are difficult to obtain. In this report, a biotin tag was 
introduced into the PCR products by means of 5'-biotinylated 
primers. An alternative labeling strategy based on the incor- 
poration of biotinylated dUTP (32) has also been tried and 
shown to be very effective (R.K.S., unpublished observa- 
tions). 

One of the prerequisites of this analytical method is that all 
of the bound oligonucleotides must be sequence-specific 
under the same hybridization conditions. If necessary, this 
requirement can probably be met either by adjusting the 
length, position, and strand specificity of the probes, as was 
done for the HLA-DQA assay, or by varying the amount 
applied to the membrane, as was done for the ^-thalassemia 
assay. The presence of tetramethyiammonium chloride in the 
hybridization buffer can also serve to minimize the differ- 
ences among immobilized oligonucleotides caused by vary- 
ing base compositions (ref. 33; T. Bugawan, personal com- 
munication). 

Although it may entail some initial effort, the end result is 
a simple, robust, and potentially automatable system that can 
be completed (amplification, hybridization, and color devel- 
opment) in 3-4 hr. "Reverse dot blots" should be particu- 
larly valuable for assays where the number of potential 
sequence variations exceeds the number of samples to be 
tested. Even in situations where the number of samples and 
probes are approximately equal, the immobiiized-probe for- 
mat may be preferable since many filters can be prepared at 
one time and stored until needed. To date, this typing system 
has been used to determine the HLA-DQA genotype of >300 
unknown samples in forensic and disease-susceptibility stud- 
ies. 
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Goda, D. Spasic, and C.-A. Chang for synthesis of oligonucleotides, 
C. Perez for advice on terminal transferase tailing reaction, C. 
Dowling and H. Kazazian (Johns Hopkins) for sequences of /3-globin 
PCR primers and ^-thalassemia genomic DNA samples, S. Warren 
and J. Findlay (Eastman Kodak) for red leuco dye suspension, and 
T. White, D. Gelfand, and H. Kazazian for critical review of the 
manuscript. 
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We describe a new technique by which single base 
changes in human genes can be conveniently detected. In 
this method the DNA fragment of interest is first amplified 
using the polymerase chain reaction with an oligonucleo- 
tide primer biotinylated at its 5-end. The amplified 5-bio- 
tinylated DNA is immobilized on an avidin matrix and ren- 
dered single-stranded. The variable nucleotide in the im- 
mobilized DNA is identified by a one-step primer extension 
reaction directed by a detection step primer, which anneals 
to the DNA immediately upstream of the site of variation. 
In this reaction a single labeled nucleoside triphosphate 
complementary to the nucleotide at the variable site is in- 
corporated. The method is highly sensitive, allowing the 
use of nucleoside triphosphates labeled with radioisotopes 
of low specific activity ( 3 H) as well as nonradioactive 
markers (digoxigenin). The procedure consists of few and 
simple operations and is thus applicable to the analysis of 
large numbers of samples. Here we applied it to the analy- 
sis of the three-allelic polymorphism of the human apolipo- 
protein E gene. We were able to correctly identify all possi- 
ble combinations of the three apo E alleles, s- 1990 Academic 

Prrss. Inc. 



INTRODUCTION 

Changes in only one or a few nucleotides have been 
found to cause several types of human hereditary dis- 
eases (Antonarakis, 1989). The increasing under- 
standing of the exact nature of the genetic defects 
causing human diseases has created a need for conve- 
nient diagnostic methods to detect these defects, at 
both the prenatal and postnatal stages. Polymerase 
chain reaction (PCR) amplification (Mullis and Fa- 
loona, 1987) has significantly improved both the sensi- 
tivity and the specificity of analyzing minute changes 
in the human genome. A number of techniques have 
been employed to identify mutations in the amplified 
DNA. In some cases, nucleotide substitutions in the 



amplified fragments may be detected by analysis of 
restriction site variation (Saiki et aL, 1985; Kogan et 
a/. ( 1987), which is feasible when the nucleotide varia- 
tion alters a restriction enzyme cleavage site. Dot blot 
hybridization with sequence-specific oligonucleotide 
probes is another generally used method (Saiki et aL, 
1986; Smeets et aL, 1988; Farr et aL, 1988). Although 
its use is not limited to the identification of nucleotide 
changes creating or eliminating restriction sites, this 
technique does not allow the identification of somatic 
mutations present in a small fraction of the cell popu- 
lation. 

Several strategies have been employed to develop 
new, more convenient methods for the detection of 
nucleotide variations in the amplified DNA. In a "re- 
version" of oligonucleotide hybridization the se- 
quence-specific oligonucleotide probes are immobi- 
lized, which allows the simultaneous analysis of one 
sample with several probes (Saiki et aL, 1989). Instead 
of using allele-specific oligonucleotides to the nucleo- 
tide change itself, they have also been used to prime 
the PCR amplification. In this technique, allele-spe- 
cific amplification is achieved using a primer with a 
3 '-mismatch at the position of the variable nucleotide 
( Wu et aL, 1989) or by a competitive priming reaction, 
in which the mismatch is located within the primer 
(Gibbs et aL, 1989). A ligation-mediated reaction, in 
which a pair of oligonucleotides annealing at adjacent 
positions at the site of the nucleotide variation, has 
been used to analyze PCR-amplified DNA (Lande- 
gren et aL, 198S: Wu and Wallace, 1989). Attachment 
of a "GC-clamp" to the 5'-end of the amplified DNA 
allows the identification of single base substitutions 
on the basis of altered melting properties of the DNA 
fragments in denaturing gradient gel electrophoresis 
(Sheffield et aL. 1989). Another approach is to intro- 
duce mobility shifting nucleotide analogs into the 
amplified DNA sample. The nucleotide variation is 
then identified by observing the mobility of the DNA 



(.iSS8-75-*:j/90 $3.00 

Copyright c 1990 hy Academic Press. Inc. 

All rifihts of reproduction in any form reserved. 



684 



PRIMER -GUIDED NUCLEOTIDE INCORPORATION ASSAY 



685 



fragments using electrophoresis in a denaturing gel 
tKornher and Livak, 1989). Chemical cleavage of 
mismatched duplexes in the amplified DNA has also 
been used to detect point mutations (Cotton et al f 
1988). 

When using sequence-specific oligonucleotides as 
hybridization probes or as PCR primers, the reaction 
conditions are extremely critical. The gel electropho- 
r;uc separation step required in several of the meth- 
ods described above is inconvenient to carry out for 
large numbers of samples. 

We report a new method for the identification of 
single base variations in human DNA, in which the 
above-mentioned limitations are avoided. We applied 
the method to analyze the genetic polymorphism of 
the human apolipoprotein E (apo E). Apo E plays an 
important role in lipoprotein metabolism (Mahley, 
1988). It is both an integral component and a media- 
tor of cellular uptake of several lipoproteins. In popu- 
lation studies it has been shown that the polymor- 
phism of the apo E, which is due to single base substi- 
tutions at two loci in the apo E gene, can explain as 
much as 105c of the individual variations in serum 
cholesterol levels (Davignon et a/., 1988). In addition, 
this polymorphism may also affect the individual risk 
of atherosclerotic vascular disease (Davignon et aL, 
1988). Apo E exists as three common isoforms, E2, 
Eo. and E4. These isoforms are encoded by three al- 
leges (c2, (3, and €4) that differ from each other by 
single base substitutions in the codons for amino acid 
residues 112 and 158. The frequencies of the *2, t3, 
and e4 alleles are 10, 75, and 159c. respectively (Mah- 
ley, 1988). We reasoned that a convenient DNA tech- 
nique for apo E genotyping would be very useful in 
genetic-epidemiological studies of hyperlipoproteine- 
mias and may also have impact for the early detection 
of individuals with increased risk of cardiovascular 
disease. 

MATERIALS AND METHODS 

Clinical Samples 

Venous blood samples were obtained from patients 
attending the Lipid Outpatient Clinic of the Univer- 
sity Central Hospital of Helsinki. Apo E phenotyping 
was accomplished by isoelectric focusing (Ehnholm et 
aL, 1986). Leukocytic DNA was isolated according to 
Bell et aL (1981). 

cparation of the Primers 

Four PCR primers (denoted P1-P4) and two detec- 
tion step primers (D112 and D158) were synthesized 
on an Applied Biosystems 381 A DNA synthesizer 
(Beaucage and Caruthers, 1982). The primers were 
designed on the basis of the known nucleotide se- 



quence of the apo E gene (Paik et aL, 1985). Their 
sequences and positions on the apo E gene are given 
in Table 1. A 5'-amino group was added to the primer 
P2 with the Aminolink II reagent (Applied Biosys- 
tems). A biotin residue was attached to the amino 
group using a water soluble sulfo-NHS-biotin ester 
(Pierce Chemical Co.) and the biotinylated oligonucle- 
otide was purified by reversed phase HPLC (Bengt- 
strom et aL, 1990). 

Polymerase Chain Reaction 

The DNA (100 ng per sample) was amplified with 
the Pi and P4 primers (final concentrations 1 j*M) in 
100 pi of a solution of 0.2 mM each of dATP, dCTP, 
dGTP, dTTP, 20 mM Tris-HCl, pH 8.8, 15 mM 
(NH 4 ) 2 S0 4 , 1.5 mM MgCl 2 , 0.1% Tween 20, 0.01% 
gelatin, and 2.5 units of Thermits aquaticus (Taq) 
DNA polymerase (United States Biochemical Corp.) 
in a DNA thermal cycler (Perkin-Elmer/Cetus) for 25 
cycles for 1 min at 96°C and 2 min at 65°C. For ream- 
plification with a nested pair of primers, an aliquot (3 
//I of a 1:100 dilution) of the first PCR product ampli- 
fied with Pi and P4 was transferred to a second PCR. 
This was carried out under the conditions described 
above and directed by the biotinylated primer P2 and 
the primer P3 at 1 or 0.1 **M concentration. 

Affinity-Capture of the Biotinylated Amplified DNA 
on Avidin-Coated Polystyrene Particles 

Ten microliters of a 5% (w/v) suspension of avidin- 
coated polystyrene particles (0.8 /im, Baxter Health- 
care Corp.) was added to an 80-jd aliquot of the PCR 
mixture. The samples were kept at 20° C for 30 min. 
The particles were collected by centrifugation for 2 
min in an Eppendorf centrifuge at 6000g and were 
washed once by vortexing vigorously for a few seconds 
with 1 ml of 15 mM NaCl, 1.5 mM Na-citrate (0.1X 
SSC), 0.2% sodium dodecyl sulfate (SDS), and once 
with 1 ml of 0.1% Tween 20 in 0.15 M NaCl, 20 mM 
phosphate buffer, pH 7.5 (PBS). The particles were 
treated twice with 200 fi\ of 0.15 M NaOH for 5 min at 
20°C. The particles were then washed once with 1 ml 
of 0.1% Tween 20 in 50 mM NaCl, 40 mM Tris-HCl, 
pH 7.5, and twice with 1 ml of 0.01% Tween 20 in 50 
mM NaCl, 40 mM Tris-HCl, pH 7.5. The particles are 
easy to handle when the solutions contain detergent 
and the concentration of NaCl is not above 0.15 M. 

The suspension of particles in the last washing so- 
lution was divided into two or four aliquots depending 
on the labeling system to be used. The particles were 
collected by centrifugation in separate tubes. 

The Detection Step Primer Extension Reaction on 
Micropar tides 

The particles carrying the DNA fragment were sus- 
pended in 10 m! of 50 mM NaCl, 20 mM MgCl 2 , 40 mM 
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Tris-HCl, pH 7.5, containing 2 pmol of the D112 or 
the D 1 58 primer. The primer was allowed to anneal to 
the DNA template by heating the samples at 65°C for 
2 min and cooling them to 20°C during the next 30 
min. One microliter of 0.1 M dithiothreitol (DTT) 
and a mixture of the appropriate deoxynucleoside tri- 
phosphates (dNTPs) and dideoxynucleoside triphos- 
phates (ddNTPs) were added to yield 1 \xM concen- 
trations each in a final volume of 15 \x\. For identifica- 
tion of T, [ 35 S]dTTP (600 Ci/mmol, Amersham; 6 
pmol diluted with 9 pmol of unlabeled dTTP), 
ddCTP, and ddGTP were added to one sample. For 
the identification of C, [ 35 S]dCTP (1000 Ci/mmol, 
Amersham; 5 pmol diluted with 10 pmol of unlabeled 
dCTP), ddTTP, and ddGTP were added to another 
sample. Alternatively, [ 3 H]dTTP (114 Ci/mmol, 
Amersham) and [ 32 P]dCTP (3000 Ci/mmol, Amer- 
sham), diluted in unlabeled dCTP to a specific activ- 
ity of 150 Ci/mmol, and ddGTP were added to one 
sample. Two microliters (3 units) of T7 DNA polymer- 
ase (Sequenase, United States Biochemical Corp.) 
was added to each tube and the reaction was allowed 
to proceed for 6 min at 42°C. 

For comparison, Taq DNA polymerase and Esche- 
richia coli DNA polymerase I (the Klenow fragment) 
were also used. The reaction with the Taq DNA poly- 
merase was carried out at 55°C in the PGR buffer. A 
60-fold excess of ddNTP was added to the reaction 
with the Klenow DNA polymerase and the buffer 
contained 50 mM Tris-HCl, pH 8.1, 2 mM dithio- 
threitol, 5 mM MgCl 2 , 40 mM KC1. 

After the incorporation step the microparticles 
were washed twice with 1 ml of 0.1X SSC, 0.2% SDS, 
and twice with 0.1% Tween 20 in PBS at 20°C. For 
elution of the reaction products the particles were 
boiled in 200 /il of H 2 0 for 5 min, cooled on ice. and 
centrifuged for 2 min in an Eppendorf centrifuge. The 
eluted radioactivity was measured in a liquid scintil- 
lation counter (Rackbeta 1219, Pharmacia/Wallac). 
3 H and 32 P were measured simultaneously by setting 
the window for 3 H at channels 10-90 and the window 
for 32 P at channels 130-220. 

Affinity-Capture of the Amplified DNA in Avidin- 
Coated Microtitration Wells 

Two 15-^1 aliquots of the second PCR mixture (am- 
plified using 0.1 \lM concentrations of primers) were 
transferred to microtitration wells (Nunc, Maxisorb) 
that had been coated with streptavidin by passive ad- 
sorption. Thirty microliters of 0.1% Tween 20 in 0.15 
M NaCl, 0.1 M Tris-HCl, pH 7.5 (TBS), was added to 
each well. The microtitration strips were incubated 
for 3 h at 37°C with gentle shaking. The wells were 
washed three times with 200 /il of 0.1% Tween 20 in 
TBS at 20°C. The wells were then treated twice with 



100 /il of 50 mM NaOH for 5 min at 20°C r followed bv 
washing twice with 200 \i\ of 0.1X SSC, 0.2% SDS 
twice with 0.1% Tween 20 in TBS, once with 0.1% 
Tween 20 in 50 mM NaCl. 40 mM Tris-HCl, pH 7.5, 
and finally once with 0.01% Tween 20 in 50 m\/ 
NaCl, 40 mM Tris-HCl. pH 7.5. 

The Detection Step Primer Extension Reaction / 
Microtitration Wells 

Ten picomoles of the primer D112 or D158 was 
added to each well in 50 jd of 0.9 M NaCl, 0.2 M Tris- 
HCl, pH 7.5. The wells were heated for 2 min at 65°C 
and allowed to cool to 20°C during 30 min. The mix- 
ture was discarded and the wells were washed once 
with 200 of 0.25 M NaCl. 0.2 M Tris-HCl, pH 7.5, 
at 20°C. Fifty microliters of a solution consisting of 1 
fiM digoxigenin-ll-dUTP (Boehringer-Mannheir.-), 
1 vlM ddCTP, 1 nM ddGTP, 0.2 M M primer (D112 or 
D158), 6 mM dithiothreitol, 37.5 mM NaCl, 15 mM 
MgCl 2 , 30 mM Tris-HCl. pH 7.5, and 3 units of T7 
DNA polymerase was added. The microtitration 
strips were incubated for 10 min at 42°C, and the 
wells were washed twice with 200 /d of 0.1X SSC, 
0.2% SDS and three times with 200 /il of 0.1% Tween 
20 in TBS. Then, 60 fx\ of a 1:1000 dilution of an 
anti-digoxigenin-alkaline phosphatase conjugate 
(Boehringer-Mannheirn) in a solution of 0.1% Tween 
20, 1% bovine serum albumin in TBS was added, and 
the microtitration strips were incubated for 2 h at 
37°C with gentle shaking. The wells were washed six 
times with 0.1% Tween 20 in TBS and once with 1 
mM diethanolamine-0.5 mM MgCl 2 buffer, pH 10. 
Finally 160 p\ of 2 mM p-nitrophenyl phosphate in 
the alkaline buffer was added. After development of 
color for 20 min at room temperature the absorbance 
of the formed product was measured at 405 nm in a 
spectrophotometric reader. 

RESULTS AND DISCUSSION 

Principle of the Method 

In the present technique the nucleotide at the vari- 
able site of the target DNA is detected by a one-step 
primer extension reaction. A specific primer, which 
anneals to the target DNA immediately upstream of 
the variable site, is elongated with a single labeled 
nucleoside triphosphate complementary to the vari- 
able nucleotide (Fig. 1). 

Prior to the detection step (primer extension reac- 
tion) the target DNA is immobilized and rendered 
single-stranded. This is accomplished by first ampli- 
fying the DNA region containing the variable nucleo- 
tide using a 5'-biotinylated PCR primer (Syvanen et 
ai f 1988). The synthesized DNA fragment carrying a 
5'-biotin residue on one of the strands is then cap- 
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FIG. 1. Principle of the method. 1. Amplification with one biotinylated PCR primer. 2, Affinity-capture. 3, Washing and denaturation. 
4, Annealing of the detection step primer. 5. The detection step reaction: a, one label, divided sample; b, two labels, undivided sample. Bio 
denotes biotin: symbolizes avidin or streptavidin. 



tured on a solid matrix, taking advantage of the inter- 
action between biotin and avidin (Syvanen et a/., 
1989). Immobilization of the amplified DNA enables 
efficient removal of the excess of PCR primers and 
dNTPs, as well as denaturation of the double- 
stranded amplified DNA fragment. This is a prerequi- 
site for carrying out the detection step primer exten- 
sion reaction. 

If the two dNTPs used to detect the nucleotide vari- 
ation carry the same labeL the sample is divided be- 
fore the detection step reaction. Alternatively an un- 
divided sample is analyzed with two dNTPs carrying 
different labels (Fig. 1). The procedure allows the 
identification of nucleotide substitutions in samples 
from both homozygous and heterozygous individuals. 
In a sample from a heterozygote, a signal is obtained 
from the reactions with both labeled dNTPs. When a 
sample from a homozygous individual is analyzed, 
only one signal corresponding to the nucleotide pres- 
ent is obtained. 

Here we applied the method to analyze the three- 
aHelicpolymorphism of the apo E gene. The polymor- 
phism of apo E is due to single base substitutions in 
the codons for amino acids 112 and 158. The nucleo- 
tides of codons 112 and 158 are either CGC (encoding 
arginine) or TGC (encoding cystein) (Fig. 2). The nu- 
cleotides at both variable sites were determined from 
the same amplified fragment. For this the PCR-am- 
plified sample was divided and analyzed using a de- 



tection step primer specific for codon 112 and another 
detection step primer specific for codon 158. 

Preparation of Immobilized Single-Stranded DNA 

The specificity of the method is affected by the qual- 
ity and quantity of the biotinylated PCR product. We 
used two consecutive PCRs with nested sets of 
primers. The primary primers (Pi and P4) were 328 
bp apart on exon 4 of the apo E gene. The nested 
primers P2 (biotinylated) and P3 amplified a 265-bp 
fragment over the region coding for amino acids 112 
and 158. For comparison, a single PCR with the latter 
primer pair was carried out directly on the genomic 
DNA (Fig. 3). 



Amino acid position: 




112 


158 




E2 - - — Cys 


Cys 




E2 TGC 


TGC 






Arg 




C3 TGC 








Arg 











FIG. 2. The apolipoprotein E isoforms and alleles. The protein 
isoforms are designated E2. E3. and E4. The corresponding alleles 
are <2, (3, and <4. The amino acid residues at positions 1 12 and 158 
and the cor respond in£ codons are shown. 
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(3745) 
codon 112 



(36831 
codon 158 (39431 




Pal P4 



FIG. 3. Location of the polymerase chain reaction and detec- 
tion step primers on the apolipoprotein E gene. P1-P4 are PCR 
primers: D112 and DISS are the detection step primers for codons 
112 and 15S. respectively. The numbers in parentheses refer to 
nucleotide numbers on the apo E gene according to Paik et ai (20). 
The drawing is to scale. The arrows indicate the direction of the 
primer extension reaction. Bio denotes biotin. 



The quality of the PCR product was assessed bv gel 
electrophoresis of l/10th of a PCR reaction. Using 
the nested PCR method a single band was always ob- 
served, while after a single PCR a few extra bands 
were occasionally seen. The amount of PCR product 
was estimated by the affinity-based hybrid collection 
method (Syvanen et aL, 1988). In the two-step PCR 
we obtained about 5 pmol of product from the reac- 
tions, whereas the amount of amplified DNA varied 
between 0.1 and 0.8 pmol when only a single PCR was 
used. 

The biotinylated PCR-amplified DNA was cap- 
tured on avidin-coated polystyrene particles or in mi- 
crotitration wells coated with streptavidin. The biotin 
binding capacity of the amount of avidin particles 
used is sufficient to capture at least 100 pmol of biotin 
(Syvanen et ai, 1988). The alkaline treatment used to 
remove the nonbiotinylated strand of the amplified 
DNA does not disrupt the biotin-avidin interaction. 
The biotin binding capacity of the microtitration 
wells is lower than that of the microparticles (Harju et 
ai. 1990). The amount of biotinylated primer in the 
second PCR was therefore reduced to 10 pmol when 
microtitration wells were used to capture the amDli- 
fied DNA. 



The Detection Step Reaction 

The detection step primers D112 and D158 are 20- 
mer hybridizing to the region immediately upstream 
of the variable first nucleotide of codons 112 and 158 
respectively (Table 1, Fig. 3). 

The effect of the amount of primer used in the de- 
tection step reaction was tested. Increasing the 
amount of primer from equal molar concentration to 
2-, 5-, and 10-fold molar excess in relation to immobi- 
lized target DNA did not have any effect on the signal 
or the specificity of the reaction. Omitting the sepa- 
rate annealing reaction decreased the efficiency of the 
reaction to one-fourth of that obtained with the stan- 
dard technique, i.e., with consecutive primer-anneal- 



TABLE 1 

Sequence and Position on the 
Apolipoprotein E Gene of the Oligonucleotide 



Nucleotide 
polipof 
Primers 



Sequence 



Position 0 



Pt 

P3 

D112 
D158 
P2 
P4 



5' - AAG GAG TTG AAG GCC TAC AAA T 3616-3? • ~ 

5|-GAA CAA CTG AGC CCG GTG GCG G 3649-3(5*. j 

5'-GCG CGG ACA TGG AGG ACG TG 3725-37.-4 

5' - ATG CCG ATG ACC TGC AGA AG 3863-388') 

5'- TCG CGG GCC CCG GCC TGG TAC A 3914-3893 

5'»GGA TGG CGC TGA GGC CGC GCT C 3943-3922 



° The positions are given as nucleotide numbers of the apo E 
gene according to Paik et at (20). 



ing and labeling-termination reactions using T7 
DNA polymerase. 

The design of the dNTP/ddNTP mixture used in 
the labeling-termination reaction is an important pa- 
rameter of this method. Inclusion of ddNTPs into the 
reaction mix ensures complete termination of the re- 
action. When C at either of the variable sites in the 
apo E gene is to be detected, labeled dCTP and unla- 
beled ddTTP are used; when a T is to be detected, the 
mixture contains labeled dTTP and unlabeled 
ddCTP. In addition, ddGTP, which corresponds to 
the second nucleotide in codons 112 and 158, is in- 
cluded in the reaction mixtures. When dCTP and 
dTTP carrying different labels are used in the same 
reaction, only ddGTP is added. The optimal result 
was obtained with all dNTPs and ddNTPs at 1 fiM 
concentration. 

We compared three DNA polymerases in the label- 
ing-termination reaction. The results presented in 
Table 2 show that the T7 and T. aquaticus DNA poly- 

TABLE 2 

Comparison of DNA Polymerases in the Detection 
Step Reaction 0 



Sample 
phenotype 6 


Enzyme' 


Radioactivity 
collected (cpm) rf 


% 

misincorporation 


E2/E2 (T/T) 


T7 


107.000 




E4/E4 (C/C) 


T7 


1.600 


1.5 


E2/E2 (T/T) 


Taq 


137.000 


E4/E4 (C/C) 


Taq 


1.800 


1.3 


E2/E2 (T/T) 


Klenow 


138.000 


E4/E4 (C/C) 


Klenow 


28.200 


21.0 



° The comparison was carried out using 3 units of each enzvme 
and the detection step primer D158 and [ 3 HJdTTP as label. 

* The first nucleotide of codon 158 is given in parentheses. 

e T7, T7 DNA polymerase; Taq, Thermus aquaticus DNA poly- 
merase; Klenow. the Klenow fragment of the E. colt DNA polymer- 
ase I. 

d The background values from control reactions without DNA 
(240-380 cprn) have been subtracted. 
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TABLE 3 

Detection of the Variable Nucleotides in Codons 112 and 158 of the Apolipoprotein E gene with 36 S-Labeled 

dTTP and dCTP 



Radioactivity incorporated (cpm)° 



Sample Codon 112 Codon 158 First nucleotide 



No. 


Phenot.vpe 


T- react ion 


C-reaction 


T-reaction 


C-reaction 


Codon 112 


Codon 158 


1. 


E2/E2 


105.000 


826 


42.100 


390 


T/T 


T/T 




E3/E2 


111.000 


680 


89,900 


16.200 


T/T 


T/C 


3. 


E4/E2 


38.500 


14J00 


27.900 


5.720 


T/C 


T/C 


4. 


E3/E3 


86.500 


444 


655 


43.100 


T/T 


C/C 


5. 


E4/E3 


50.500 


8.370 


245 


15.300 


T/C 


C/C 


6. 


E4/E4 


1.050 


32.100 


395 


15,000 


C/C 


C/C 



"The background values from control reactions without DNA have been subtracted (Codon 112: T-reaction r 502 cpm; C-reaction, 214 
cpm. Codon 158: T-reaction. 875 cpm: C-reaction. 790 cpm). 



merases perform satisfactorily. The E. coli DNA poly- 
merase I (the Klenow fragment) yielded significant 
misincorporation of labeled dNTP. The reason for 
this is probably the 3' — > o exonuclease proofreading 
activity of the Klenow DNA polymerase, which is 
lacking in the two other enzymes ( Tabor and Richard- 
son, 1987; Innis et al, 1988). 

Detection of the Polymorphism of the 
Apolipoprotein E Gene 

Using the optimized procedure we analyzed six sam- 
ples, which by isoelectric focusing were first shown to 
correspond to the six possible combinations of the 
three apo E alleles. 

In one series, dNTPs labeled with were used. 
Each sample, amplified by the "nested" PCR method, 
was collected on avidin-coated microparticles and di- 
vided into four aliquots for the detection step reac- 
tion. The variable nucleotide <C or T) in the first po- 
sition of codon 112 was analyzed in two of the aliquots 
unng the primer D112 and the marker molecule 
[ ^SjdCTP or [ 35 S]dTTP, respectively. Similar condi- 
tions and the primer D158 were used to identify the 
variable nucleotide in codon 158. The high positive 
signals and low levels of misincorporated label al- 
lowed the unequivocal identification of a C or a T in 
both variable sites in all six genotypes (Table 3). Het- 
erozygous subjects were identified by a positive signal 
from both the C- and the T-reaction at the same site 
(samples no 2, 3, and 5). The C-reactions constantly 
yielded lower signals than the T-reactions despite the 
k.:t that the input of radioactivity was higher in the 
C-reactions. The reason for this is presumably that 
the enzyme preferentially incorporates unlabeled 
dCTP over thio-[ 3S S]dCTP. 

By combining PCR amplification with the simple 
solid-phase detection step reaction a highly sensitive 



and specific technique for identification of single nu- 
cleotide variations in the human genes was thus con- 
structed. From the signals obtained in the experiment 
presented in Table 3, in which each amplified sample 
was divided into four aliquots, it can be calculated 
that mutations present in less than a 1 % minority of a 
cell population can be identified. A sensitivity of this 
range is required when specific minority point muta- 
tions, such as those of the ras oncogene family (Bos et 
al, 1987; Farr et al, 1988), are to be detected in cell or 
tissue samples containing mutant alleles in the pres- 
ence of an excess of the normal gene. 

The apo E gene is a particularly favorable target for 
analysis by this method for two reasons. First, the 
close proximity of the variable DNA sites allows their 
co-amplification within a single DNA fragment. Sec- 
ond, only three allelic forms are prevalent in the popu- 
lation, which implies that all three alleles can be un- 
equivocally identified. When more than one variant 
DNA site is to be detected and more than three allelic 
combinations exist, assignment of each variable nu- 
cleotide to the correct allele is beyond the ability of 
the present technique. This problem could be circum- 
vented by allele-specific amplification of the DNA 
fragment (Wu et al, 1989; Gibbs et al, 1989) with a 
biotinylated PCR primer prior to the detection step 
reaction. 

Here we applied the technique for the detection of 
single base substitutions in the human genome. Dele- 
tion and insertion mutations in the target DNA can 
be detected equally well. The only restriction is that 
the first nucleotide of the deletion/insertion must not 
be identical to the first nucleotide after the deletion/ 
insertion. 

Double- Labeling Systems 

A modification of the technique described above 
and involving the use of a double-labeling system was 
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TABLE 4 

Detection of the Variable Nucleotides in Codons 112 and 158 of the Apolipoprotein E Gene with 3 H-labeled 

dTTP and 32 P-Labeled dCTP 



Radioactivity incorporated (cpm)° 



Codon 112 Codon 153 First nucleotide 



T-reaction C-reaction T-reaction C-reaction Codon 112 Codon * >3 



1. E2/E2 29.300 286 33,200 722 T/T T/T 

2. E3/E2 13.400 309 44,600 23,500 T/T T/C 

3. E4/E2 10.600 3,840 16,100 7.300 T/C T/C 

4. E3/E3 16,400 72 216 26.400 T/T C/C 

5. E4/E3 2,640 7,470 455 22.500 T/C C/C 
6- E4/E4 172 10,000 158 16.500 C/C C/C 



" The cpm values from control reactions without DNA have been subtracted (Codon 112: T-reaction, 215 cpm; C-reaction, 379 cpm. Codon 
158: T-reaction, SO cpm; C-reaction, 65 cpm). 



Sample 



No. Phenotype 



also developed. In the modified technique, the immo- 
bilized, amplified sample was divided into two ali- 
quot s: one for analysis with the D 1 1 2 and one for anal- 
ysis with the D158 primer. 3 H-labeled dTTP and 32 P- 
labeled dCTP, diluted in unlabeled dCTP to yield a 
specific activity similar to that of the [ 3 H]dTTP, were 
used as markers in the same reaction. After comple- 
tion of the detection step reaction, the radioactivity 
resulting from the incorporated 3 H- and 32 P-isotope 
was measured simultaneously in the liquid scintilla- 
tion counter. Again, the variable nucleotides at both 
sites were easily identified in all six samples (Table 4). 

The fact that the variable sites of the apo E gene are 
located only 138 bases apart enables detection of the 
nucleotide variations at both loci from the same am- 
plified fragment. To identify nucleotide variations lo- 
cated too distant from each other to be amplified with 
one primer pair, PCR can be carried out using multi- 
ple primer pairs in the same reaction. Such a "multi- 
plex" PCR has been used to simultaneously amplify 
six distant loci in the muscular dystrophy gene 
(Chamberlain et a/., 1989). The sensitivity of our 
method allows division of one sample into a large 
number of aliquots for analysis of several sites with 
different detection step primers. In such cases, the use 
of a double-labeling system is especially advantageous 
by halving the number of aliquots to be analyzed. 

The high sensitivity of detection allows the use of 
3 H -labeled dNTPs, which have low specific activities 
and the advantages of weak /^-emission and a long 
half-life (13 years). 

Stable Labels 

The applicability of nonradioactive detection of the 
different apo E genotypes was also demonstrated (Ta- 
ble 5). In this case, the amplified DNA was collected 



in microtitration wells coated with streptavidin. Di- 
goxigenin-ll-dUTP was used to identify T in codons 
112 and 158. The incorporated digoxigenin-ll-dUTP 
was detected enzymatically by an anti-digoxigenin- 
alkaline phosphatase conjugate. The colored end-pro- 
duct of the enzymatic reaction was measured in a spec- 
trophotometry reader. The absorbance signals ob- 
tained from the samples with a T as the first 
nucleotide in codon 112 (samples 1-5) and codon 158 
(samples 1-3) were clearly distinguishable from the 
signals generated by the samples with only a C at the 
variable site (Table 5). High positive signals were ob- 
tained despite the fact that only a small portion (15%) 
of the PCR sample was analyzed per microtitration 
well. Other dNTPs modified with haptens such as flu- 
orescein isothiocyanate or dinitrophenyl groups (Ku- 
mar et a/.. 1988; Vincent et ai, 1982) may analoguosly 
be used. To avoid the indirect detection of a hapten 
with a labeled antibody, directly detectable 
nonradioactive labels would greatly simplify the pro- 

TABLE 5 

Nonradioactive Identification of T in Codons 112 
and 158 of the Apolipoprotein E Gene 



Sample Absorbance ar 405 nm" Identification of T 



No. 


Phenotype 


Codon 112 


Codon 158 


Codon 112 


Codon 15S 


1. 


E2/E2 


1.29 


1.31 


+ 


+ 


2. 


E3/E2 


1.31 


0.79 


+ 


+ 


3. 


E4/E2 


0.32 


0.2S 


+ 




4. 


E3/E3 


0.41 


0.06 


+ 




5. 


E4/E3 


0.65 


0.07 


+ 




6. 


E4/E4 


0.06 


0.09 







° The absorbance readings from control reactions without DNA 
have been subtracted (codon 112, 0.09; codon 158, 0.11). 
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cedure. Any label that can be attached to a nucleoside 
triphospate in such a way that the nucleotide is incor- 
porated in a sequence-specific fashion can be used. In 
preliminary experiments we have shown that 
ddNTPs modified with derivatives of fluorescein 
isothiocyanate (Prober et al, 1987) are applicable to 
the present method (Syvanen et ai t unpublished). 
The sensitivity of detection in a standard fluorometer 
(Hitatchi, F-1000) is sufficient. In comparison with 
dNTPs, labeled ddNTPs provide an additional ad- 
vantage in that the labeling and termination steps are 
combined into the same reaction. Thus, double-label- 
ing systems can be applied even in cases where the 
nucleotide following the variable site is identical to 
one of the nucleotides at the variable site. 

CONCLUSIONS 

Compared to other methods used to detect nucleo- 
tide changes in the human genome the method de- 
scribed here has several major advantages. It com- 
prises few and simple steps which all are carried out in 
test tubes or microtitration wells. Because the target 
DNA is immobilized before analysis, electrophoretic 
separation is avoided. The results are obtained as nu- 
meric values, enabling objective interpretation of the 
data. Thus, the method consists of elements suitable 
for automatization with already existing laboratory 
robots (Wilson et aL, 1988). Furthermore, the high 
sensitivity of detection allows the identification of 
mutations present in only a small fraction of the ana- 
lyzed cells. Nonradioactive detection methods are ap- 
plicable, which renders the technique suitable to hos- 
pital and other clinical laboratories. These facts make 
the primer-guided nucleotide incorporation assay a 
promising tool for the diagnosis of inherited diseases 
and for the detection of somatic mutations. 
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iere is increasing agreement that association stud- 
Mpfm, V 6 ' ° f sin S le - nucleo «de polymorphism 
MP) markers across the genome with markers 

nly distributed at -100-kb intervals would pr" 
ie the necessary power to detect small genetic ef 
ts for a given complex disease trait (Collins et al 
?7; Kruglyak 1997). To develop 30,000 or more 
P markers is a priority of the consortium of Na- 
nal Institutes led by the National Human Ge- 
ne Research Institute (Marshall 1997). Although 

frequency of SNPs is approximately 1 in 1000 bp 
ween any two chromosomes (Cooper et al 1985 
3k et al. 1996), there are currently no efficient 
's to find and map them from scratch 
In general, development of SNP markers re 
es five different steps: obtain DNA sequence, de- 
p STSs from the DNA sequence, screen STSs for 
s, characterize SNPs, and map SNPs to specific 
■mosomal locations. To date, much effort has 



"ponding author. 

I kwok@lm.ivujtI.edu; FAX (314) 362-8159. 



'GENOME RESEARCH 



been devoted to devising more efficient ways to 
screen STSs for SNPs and to characterize them. Being 
largely ignored is the fact that the most costly as- 
pects of developing SNP markers are the obtaining 
of DNA sequence for STS development initially and 
mapping the SNPs at the end of the process. We and 
others have developed various strategies to improve 
. he efficiency of this process by utilizing existing 
esources to our advantage.. For example we have 
screened mapped STSs for SNPs, thereby reducing 
the development of SNPs to two steps (screening for 
and characterizing SNPs), abrogating the need for 
genomic DNA sequencing, STS development, or 
mapping (Kwok et al. 1996). Because the mapped 
STSs were developed for YAC library screening in 
pnysical mapping, they amplified short DNA frag- 
ments and screening several of them was required 
before a SNP could be found. Moreover, as the 
Physical mapping effort is winding down over the 
next 2 years, this resource will not be available for 
turthet SNP marker development. 

tn . F ° rtunatei y< tne huma n genome sequencing ef- 
fort that is currently under wav provides a better 



■■■W8-W4 «, 998 Ov Co,d Spring Horbor Uborotory P, esl ISSN 10 S4- 9 «03/o 8 ,s.00; www. 9 enom,o, 9 



way to develop a dense set of SNP markers in the 
genome. Like screening mapped STSs for SNPs, our 
approach by passes the need for DNA sequence ac- 
quisition and mapping for SNPs. In addition, it 
eliminates the polymorphism screening step alto- 
gether, leaving only the development of STSs 
around the SNPs found during the course of genome 
sequencing and their characterization. The strategy 
is based on the fact that bacterial artificial chromo- 
somes (BACs) and Pl-based artificial chromosomes 
(PACs), the substrates of choice for long-range se- 
quencing, come from diploid libraries. With an av- 
erage insert size of -120 kb ; one can expect a sig- 
nificant overlap between clones selected for se- 
quencing at -100 kb intervals. If the clones are from 
different libraries (presumably from different indi- 
viduals), one is comparing the sequences from two 
lineages in the overlapping region. If the clones are 
from the same library, there is still a 50% chance 
that the overlapping clones are derived from differ- 
ent lineages (paternal or maternal). This probability 
could increase to close to 100% if libraries made 
from mixtures of individuals are used. Although one 
is sampling just two copies of the same region for 
polymorphisms in this approach, the chance of 
identifying a polymorphism in the region is the 
same as the heterozygosity of the polymorphism in 
the population. Therefore, for the more informative 
markers (defined as having heterozygosity of >30%) 
there is a >30% chance of identifying them when 



SNPs IN OVERLAPPING GENOMIC SEQUENCES 

sequences of overlapping clones are examined. Even 
with a minimal 10% overlap when clone libraries 
with 10-fold genomic redundancy are used to pro- 
vide sequencing clones, a 6-kb overlap at each end 
of the clone insert will result. Given the general ob- 
servation that one informative marker is found in 
1.5-2.0-kb in the human genome (Kwok et al. 
1996), at least one such polymorphism should be 
found in each 4.5-6-kb overlap. In practice, the 
overlaps are much larger than 6 kb because of the 
stringent requirements of physical mapping by fin- 
gerprinting in selecting clones for large-scale ge- 
nome' sequencing (Marra et al. 1997), and as de- 
scribed here, one can almost certainly find informa- 
tive SNPs in them. 

To test the validity of this approach, we have 
analyzed in detail every SNP identified while se- 
quencing three sets of overlapping clones found on 
chromosome 5pl5.2, 7q21-7q22, and 13ql2- 
13ql3. We report here that this approach of SNP 
marker development is highly efficient and cost ef- 
fective, requiring only the two simple steps of de- 
veloping STSs around the known SNPs and charac- 
terizing them in the appropriate populations. 

RESULTS 

In the course of sequencing overlapping BAC and 
PAC clones from human genomic libraries at the 
Genome Sequencing Center (GSC) at Washington 



Table 1. Results of Analyzing 68 SNPs Found In 3 Overlapping Clones 



Informative SNPs c 



Length 

of 

Chromosome overlap Polymorphisms SNPs African- 

(bp) found SNPs» analyzed b American Caucasian Hispanic 



location 



populations 



>2 



>1 



5pl5.2 



81,830 



7q21-7q22 59,048 



I3q12-q13 59,739 



20 



97 



66 



Totals 200,617 183 



18 



83 



52 



153 



10 
8 STSs 
3892 bp 

20 
1 3 STSs 
6360 bp 

38 
23 STSs 
7720 bp 

68 
44 STSs 
18172 bp 



6 


8 


6 


6 


6 


8 


60% . 


80% 


60% 


60% 


60% 


80% 


14 


15 


17 


12 


16 


• 18 


70% 


75% 


85% 


60% 


80% 


90% 


11 


8 


11 


5 


10 


16 


29% 


21% 


29% 


13% 


26% 


42% 


31 


31 


34 


23 


32 


42 


46% 


46% 


50% 


34% 


47% 


62% 



•SNPs found among all of the polymorphisms identified in the overlap 

«~ «-din 9 from consider polymorphisms found 
'Number of SNPs (and the proportion among I th ^ SNPs ana Inf n IT* ° ^ ^ SNPS "* ,bo M 

according to the individual populations studie an u! e „ U S£ T^TJ^V fl*™ '°' *« mlnor > Me are C8le 9° ri2ed 
anq the ""moer of populations in which these informative SNPs are found. 
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University in St. Louis, MO, 183 polymorphisms 
were found in 200.6 kb in three overlapping re- 
gions, for an average of 1 polymorphism every 1.1 
kb (see Table 1). In the 81.8-kb overlap on chromo- 
some 5, 20 polymorphisms were identified. Of 
these, 17 polymorphisms were single-base substitu- 
tion polymorphisms (85%), 1 was an unique inser- 
tion/deletion polymorphism (5%), and 2 were short 
tandem repeat polymorphisms (STRP) (10%). In the 
59.0-kb overlap on chromosome 7, 97 polymor- 
phisms were found, with 83 being substitution 
polymorphisms (86%), 11 insertion/deletion poly- 
morphisms in a run of a single base such as a poly(A) 
(11%) and 3 STRPs (3%). In the 59.7-kb overlap on 
chromosome 13, 66 polymorphisms were found, 
with 49 substitution polymorphisms (74%), 3 
unique insertion/deletions (5%), 12 insertion/ 
deletions in a run of a single base (18%), and 2 
STRPs (3%). Overall, there were 153 SNPs (substitu- 
tion and unique insertion/deletion polymorphisms) 
at a frequency of 1 per 1.3 kb. In contrast, there were 
only 7 STRPs (1 per 28.7 kb). 

The 153 SNPs were evaluated further for their 
usefulness as genetic markers. Those found in com- 
mon repeat regions masked by the GSC during the 
sequence annotation such as AIu and LI were dis- 
carded. In all, 55 SNPs were eliminated by computer 
analysis 5/19 (26%) in the chromosome 5 overlap, 
43/83 (52%) in the chromosome 7 overlap, and 7/52 
(13%) in the chromosome 13 overlap. The oligo- 
nucleotide selection program (osp) (Hillier and 
Green 1991) was used to design primers to amplify 
each of the 98 remaining SNPs. Thirty SNPs were in 
regions of DNA in which no suitable amplimers 
could be found, including SNPs in repeat regions 
other than Alus and Lis and a small number of PCR 
failures. In all, 44 STSs were developed to amplify 
the remaining 68 SNPs (spanning 18,1 72 bp of DNA 
sequence). Among them, 16 STSs contained 2 SNPs 
and 8 STSs contained 3 or more SNPs. 

The 68 SNPs were confirmed by sequence analy- 
sis with the homozygous CHM1 DNA from a homo- 
zygous complete hydatidiform mole and pooled 
DNA samples from 30 individuals each from the 
Caucasian, Hispanic, and African-American popula- 
:ions (Kwok et al. 1994; Taillon-Miller et al. 1997). 
rhe pooled DNA sequencing approach for allele fre- 
luency estimation is highly reproducible and has 
>een found to give estimates of within 5% of the 
rue allele frequency as found by genotyping all the 
ndividuals in the population pool (Kwok et al. 
994). Allele frequency estimates of the 68 SNPs re- 
ealed that the minor allele in 23 SNPs (34% of the 
nalyzed SNPs) had a frequency >20% in all three 



populations (>32% heterozygosity, assuming 
Hardy-Weinberg equilibrium), 32 (47%) had a fre- 
quency >20% in at least two populations, and 42 
(62%) had a frequency >20% in at least one popu- 
lation. 

In addition, 18 new SNPs were discovered dur- 
ing the course of analyzing the 44 STSs in 18.2 kb of 
DNA sequence contained in the STSs (found at a rate 
of 1 per 1.0 kb). Among these, 9 (50%) were found 
to be informative in one or more populations and 3 
(17%, 1 per 2.0 kb) were informative in all three 
populations. 

The 26 SNPs with frequencies >20% in all 3 
populations tested are presented in Table 2. Infor- 
mation about the remaining SNPs with their esti- 
mated allele frequencies in each of the three popu- 
lations can be found on our public database (cur- 
rently under construction) accessible through the 
internet (http://www.ibc.wustl.edu/SNP). 

DISCUSSION 

The overall results of this study confirmed that the 
chance of finding informative SNPs by use of over- 
lapping regions of clones sequenced as part of the 
human genome sequencing project was in line with 
our expectations. We had expected to find one in- 
formative SNP per 4.5-6.0 kb, and we found one 
informative SNP per 4.8 kb (3.9 kb if those markers 
discovered during the sequence analysis of STSs for 
characterization of the SNPs were included). Al- 
though it is safe to assume that the ethnic origins of 
the donors of the BAC and PAC libraries are Cauca- 
sian, many of the SNPs found were also polymor- 
phic in the African-American and Hispanic popula- 
tions. These results point to the fact that the more 
informative SNPs are more ancient and are therefore 
informative in most populations (Kimura 1983). 

On closer examination, however, it is clear that 
there is a large variation in the frequency of finding 
SNPs among the different overlaps. For example, 
whereas SNPs are found at a rate of 1 per 4.5 kb in 
the 5ql5.2 overlap, the rate is 1 per 0.7 kb in the 
7q21-7q22 overlap and 1 per 1.1 kb in the 13ql2- 
13ql3. After computer screening, the rate of analyz- 
able SNPs ranges from 1 in 8.1 kb for 5ql2.2, to 1 in 
3.0 kb for 7q21-7q22, to 1 in 1.6 kb for 13ql2- 
13ql3. 

Furthermore, the chromosome 7q21-7q22 
overlap was unique in that it had an extremely large 
number of repeat elements within which the bulk of 
the SNPs were found, leaving only 20 of the 83 SNPs 
(24%) suitable for sequence analysis. Among the 
analyzable SNPs, >80% were informative in at least 
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one population in both the chromosome 5 and 7 
..overlaps. In contrast, 38 of the 58 SNPs (73%) in the 
•chromosome 13ql2-13ql3 overlap were analyzable 
but only 16 of 38 SNPs (42%) were informative. 

Despite these regional differences, however, the 
overall results show that even in the worst case sce- 
nario, one only has to analyze two to three SNPs to 
find an informative SNP marker. Given that almost 
all overlaps produced by large-scale genome se- 
quencing projects are >20 kb, there will be more 
than enough analyzable SNPs to choose from. 

Our approach has many advantages not found 
in other current methods. First, all long-range se- 
quencing groups produce high quality sequence 
data, and because every base is sequenced at least 
twice from each clone (so-called double-stranding), 
the error rate is therefore much lower than the poly- 
morphism rate. Second, because the polymorphism 
data are generated by examining existing data, the 
amount of sequencing required is minimal and the 
cost of the project is shifted from identification of 
polymorphisms "toward estimates of the usefulness 
of the polymorphism by population sequencing. 
Third, because they are derived from long-range se- 
quence data, the markers are precisely mapped, not 
just assigned to an interval of a clone-based contig 
by STS content mapping. Fourth, the physical dis- 
tance between markers is known precisely. Fifth, be- 
cause they are detected when only two chromo- 
somes are examined, each SNP identified in this way 
has a higher chance of being informative. Sixth, this 
approach scales easily because the basic methods 
used are simple and robust, making it possible to 
keep up with the expanding sequencing efforts 
around the world. Consequently, the genetic map 
could be completed along with the sequencing of 
the genome. Seventh, the markers would be intrin- 
sically distributed more evenly than those based on 
genes. 

By use of this approach, a high-density genetic 
map with precisely placed SNP markers that are 
evenly placed in the genome can be assembled with 
minimal effort and will be available for use to study 
complex genetic traits as soon as the genome se- 
quencing is completed in year 2005 (Collins and 
Galas 1993). 



METHODS 
DNA Sequences 

Three overlap regions containing SNPs were identified by the 
GSC for this study. On chromosome 5pl5.2, an 81,830-bp 
overlap between BAC clones CS113H23 (GenBank accession 



no. AC003015) and GS33OJ10 (accession no. AC002380); on 
chromosome 7q21-7q22, a 59,048-bp overlap between BACs 
RG293F11 (accession no. AC000066) and RG1O4F04 (acces- 
sion no. AC003086); and the BRCA2 gene region on chromo- 
some 13ql2-13ql3,a 59,739-bp overlap between PAC clones 
257C22A (accession no. AC002525 ) and 96A18A (accession 
no. U73331). 



Primary Analysis of SNPs 

The GSC provided us with the ability to access the database 
remotely and the primary assembly data was viewed by use of 
the XGAP program (Bonfield et al. 1995). Up to four aligned 
sequencing traces could be opened and viewed simulta- 
neously for close inspection. In the XGAP program, one can 
set the level of discrepancies at each nucleotide position over 
which it is declared an ambiguous base. When the limit is set 
at 80%, all differences between the consensus sequences from 
the two overlapping clones are designated as ambiguous and 
flagged. Given the fact that the base-calling and assembly 
programs used (PHRED, Ewing et al. 1998; PHRAP, P. Green, 
pers. comm.) take into account the sequence data quality, one 
can easily tell the sequence variations caused by poor data 
quality from the real polymorphisms. These polymorphisms 
were unmistakable because all subclones from one PAC ex- 
hibited one nucleotide but all subclones of the second PAC 
possessed another nucleotide. Because the primary data were 
available for quality check, variations caused by base-calling 
errors were easily eliminated. The sequence context of each 
polymorphism was recorded at this step. Simple sequence re- 
peats and long runs of poiy(A)s or poly(T)s were eliminated 
from further consideration. 

Annotation and masking of common repetitive elements 
(such as Alu and LI repeats) were done automatically by the 
GSC and these regions were removed from further consider- 
ation. PCR assays were designed for all the remaining SNPs by 
use of the oligonucleotide selection program (Hillier and 
Green 1991). 

[Note: At the GSC, the sequence of only one of the two 
overlapping clones is fully finished and deposited to Gen- 
Bank. Typically, the shotgun sequence data in the overlap- 
ping region found in the second clone sequenced are as- 
sembled into contigs with a few gaps in between. The prefin- 
ished data are archived and are not deposited in a public 
database. However, all of the sequencing traces for both 
clones are freely accessible. Any researcher interested in the 
overlapping sequences is encouraged to contact the GSC for 
access to these data.] 



Determining Frequencies of SNPs in Population Pools 

All PCR assays were amplified against the complete hydatidi- 
form mole 1 (CHM1) DNA, a completely homozygous DNA 
described in detail previously and the Caucasian, African- 
American, and Hispanic population pools (30 anonymous in- 
dividuals each) (Taillon-Miller et al. 1997). PCR reaction con- 
ditions and preparation of DNA template for sequencing have 
been described previously (Kwok et al 1994). DNA sequencing 
was done with the dichloro-rhodamine dye terminators ana- 
lyzed on the 377 DNA Sequencer (PE Applied Biosystems, Fos- 
ter City, CA) according to the manufacturer's instructions. 
The frequency of each allele in a population pool was deter- 
mined by comparing the DNA sequencing trace of a PCR 
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product amplified from a pooled DNA sample with that of a 
PCR product amplified from the DNA sample of the CHM1 
(Kwok et al 1994). The CHM1 DNA serves as a homozygous 
control, its normalized peak height equal to a frequency of 
100%. 
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Point mutations in the ras proto-oncogenes are amongst the most 
frequent changes found in human malignancies (1) and may have 
prognostic importance. A variety of methods have been published 
for the detection of ras mutations (2—*). However, the most 
frequently used assays are limited either by an unsatisfactory 
sensitivity (2,3), the spectrum of detectable mutations (4) or require 
radioactive labeling (2,3). Here, we present a novel approach for the 
detection of ras point mutations based on the recently described 
method of peptide nucleic acid (PNA) mediated PCR clamping 
(5). PNAs are DNA-mimics where the phosphoribose backbone is 
replaced by a peptide-Iike repeat of (2-aminoethyl)-glycine units. 
Due to this chemical difference PNAs differ from DNA molecules 
in several aspects: (i) PNA/DNA-hybrids have a higher thermal 
stability compared with the corresponding DNA/DNA hybrids 
(-1 °C/base for mixed sequences); (ii) PNA/DNA hybrids are more 
destabilized by single base pair mismatches than the corresponding 
DNA/DNA hybrids (5); and (iii) PNAs could not serve as primer 
molecules in PCR. 

The basic idea to use these features for the detection of ras gene 
mutations was to extend the original assay described for mutations 
at a single position (5) to several mutations clustered in a 4-5 bp 
span in one PCR. The principle was to hybridize chromosomal 
human DNA to a 15mer PNA complementary to the wild-type (wt) 
Ki-ras sequence surrounding codons 12 and 13 (schematically 
illustrated in Fig. 1). We reasoned that, in the case of wt Ki-ras, 
formation of PNA/DNA hybrids would be favoured The bound 
PNA should sterically hinder annealing of a partially overlapping 
generic oligonucleotide, thus excluding the normal Ki-ras sequence 
from sufficient PCR amplification. In the case of mutant alleles, the 
melting temperature of the PNA/DNA hybrid was reduced, thereby 
allowing the 23mer oligonucleotide to outcompete PNA annealing 
and preferential amplification of mutant sequences. 

This model was tested on six of 12 possible Ki-ras mutations 
in codons 12 and 13 derived from several tumor entities, which 
were available to us and had been characterized previously (6,7). 
A concentration of 2.84 \xM PNA- 1 was found sufficient to inhibit 
any detectable Ki-ras amplification starting from wt DNA (data not 
shown). In contrast, even at 14.2 |iM PNA-1 no reduction in 
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Figure 1. Schematic illustration of PNA mediated PCR clamping for the 
detection of Ki-ras point mutations (for details, see text). 



amplification was seen with primers for the human v-interferon gene 
(8) or the human growth hormone gene, indicating the specificity of 
this inhibition. Several reaction parameters were evaluated for their 
influence on discrimination between mutant and wt Ki-ras alleles, 
e.g. the T m of the competing generic primer, PCR temperature 
profiles, the total number of cycles, buffer composition and different 
polymerases. Among these, the total number of PCR cycles, the 
oligonucleotide annealing temperature and the amount of template 
DNA added were most important A significant improvement was 
achieved by addition of glycerol (15% v/v) to the reaction. 

After optimization the assay was able to detect all tested Ki-ras 
mutations irrespective of the site or type of the mutation in a single 
PCR (Fig. 2). A 429 bp fragment of the human growth hormone 
gene was coamplified in each reaction to exclude unspecific PCR 
inhibition in cases with no Ki-ras amplification. We also checked 
the sensitivity of our method by diluting a sample carrying a 
heterozygous GGT-+GAT mutation in codon 12 in wt DNA. 
Under the conditions described, the mutation was detected down 
to one mutant allele in a background of 200 wt alleles. Direct 
sequencing confirmed that the mutant allele was amplified predomi- 
nantly (Fig. 3D), whereas sequence analysis of the same sample 
reacted without PNA failed to detect the alteration (Fig. 3Q. 
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Figure 2. Detection of different Ki-ras mutations in codons 12 and 13 using 
PN A mediated PCR clamping. The 1 57 bp Ki-ras product in lanes 2-7 indicates 
the presence of a mutation in the corresponding tumor, whereas wt-controls 
(lanes 8-10, lower band) are negative for Ki-ras, even with a 3-fold excess of 
DNA (0.45 ng) added (lane 10). Coamplification of a 429 bp fragment of the 
human growth hormone gene (HGH) was done in each reaction to exclude 
unspecific inhibition in negative cases (lanes 2-10, upper band). PCR was 
performed in 50 nl, containing 100 uM deoxynucleoside-triphosphates, 0.001 % 
gelatine, 50 mM KC1, 1 5 mM MgCfeand 10 mM Tris-HCl (pH 8.3), 0.25 \xM 
KRAS-1 (5'-GTACTGGTGGAGTATTTGATAGTC-3') and KRAS-2 (5'-AT- 
CGTCAAGGCACTCTTGCCTAC-3') primers, 0.12 \iM HGH-s (5'-GCCT- 
TCCCAACCATTCCCnA-3') and HGH-as (5'-TCACGGATTTCTGTTGT- 
01X1030 primers, 2.84 uM PNA-1 (H 2 N-TACGCCACCAGCTCC-CON 2 H; 
Perseptive Biosystems, Freiburg, Germany), 7.5% glycerol (v/v) and 0.15 ug 
template DNA. The mixture was covered with 50 \i\ paraffin oil. To prevent 
unspecific polymerization prior to thermal cycling, hot start was performed by 
adding Taq polymerase (1 U; Perkin Elmer Cetus) after 5 min incubation at 
94°C. PCR consisted of 28 cycles with 94°C/60 s, 70°a50 s, 58°C/50 s and 
72°C/60 s, with 1 80 s at 94° C in the first cycle and an additional final extension 
cycle with 94°C/1 min and 60°C/10 min. The additional 70°C step was 
performed in order to achieve preferential annealing of the PNA. Ten microliters 
of the reaction were electrophoresed on a 3% agarose-gel and stained with 
ethidium bromide (0.5 Jig/ml). 





Figure 3. Sequence analysis of samples amplified with (D) or without (A, B and 
Q PNA reveals preferential amplification of mutant alleles mediated by the 
PNA. (A) Wt Ki-ras sequence with GGT in codon 1 2 derived human placental 
DNA. (B) Heterozygous GGT GAT mutation in the second position of 
codon 12 derived from a liver metastasis of colorectal cancer. (C) Mutant DNA, 
diluted 1:10 in human placental DNA, was amplified without PNA; the 
mutation is not detectable. (D) The same sample reacted in the presence of 
PNA-1, predominantly the mutant allele could be detected. Automated, 
fluorescence solid-phase sequencing of PCR-products using T7-dye terminator 
chemistry [Applied Biosystems (ABI), Foster City, CA, USA] was performed 
on a 373 A DNA sequencing system (ABI) essentially as recommended by the 
manufacturer. 



In our eyes the major advantage of PNA mediated PCR clamping 
over published assays seems to be the higher flexibility which allows 
detection of mutations stretched over 4-6 bp in a single reaction. In 
addition, this method is not restricted to specific base exchanges, a 
major drawback of procedures using allele specific amplification 
(9). In conclusion, PNA-mediated PCR clamping is an attractive 
tool for the detection of ras gene point mutations. The simplicity and 
versatility make it especially helpful in large scale screening 
programs. Due to the special situation, ras gene mutations are ideally 
suited targets, but the assay could also be a rapid and sensitive 
prescreening method for common clustering mutations in other 
genes, such as hot-spot mutations in codons 175, 248 or 273 of the 
p53 tumor suppressor gene. 
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ABSTRACT 

An accurate and highly sensitive mutation detection 
assay has been developed. The assay is based on the 
detection of mispaired and unpaired bases by immobi- 
lized mismatch binding protein (Escherichia coli 
MutS). The assay can detect most mismatches and all 
single base substitution mutations, as well as small 
addition or deletion mutations. The assay is simple to 
use and does not require the use of either radioactivity 
or gel electrophoresis. 

INTRODUCTION 

A large number of human genetic diseases are caused by small 
genetic alterations, including single base substitutions and small 
additions or deletions. Such mutations may be inherited (inherited 
syndromes), arise de novo in the germline (sporadic diseases) or be 
acquired somatically (e.g. cancers). The development of diagnostic 
tests for such small DNA alterations will facilitate both the 
prevention and treatment of a wide variety of diseases. In addition, 
the ability to scan a large number of DNA samples for small 
differences will be useful for large scale studies of polymorphism 
in human and other species and in identifying unknown genes. 

The methods that, to date, have been most successful in 
detecting small genetic alterations fall into two broad categories: 
(i) those based on sequence- or mismatch-dependent variability 
in electrophoretic mobilities; (ii) those based on proteins capable 
of detecting mispaired bases in heteroduplex DNA (1-7). The 
former class, while reasonably accurate, is technically demanding 
and requires the use of polyacrylamide gel electrophoresis, thus 
making the assays labor intensive, somewhat difficult to automate 
and difficult to apply to the rapid screening of a large number of 
samples. Mismatch detection assays also fall into two broad 
classes: (i) those which involve chemical or enzymatic cleavage 
of mismatch-containing heteroduplexes at the site of a mismatch 
(1,2,5,6); (ii) those which involve binding of mismatch-contain- 
ing heteroduplexes (7). All mismatch cleaving assays and most 
mismatch binding assays require gel electrophoresis. Mismatch 
detection assays involving gel mobility shifts require the 
identification of protein-DNA complexes in polyacrylamide gels 
(3). Cleavage of mismatch-containing heteroduplexes requires 
subsequent identification of specific fragments via gel electro- 



phoresis, as do mismatch binding assays involving nuclease 
protection (4). 

Enzymatic mismatch cleavage recognizes distortions produced 
by disruptions in base pairing, such that those mismatches which 
produce maximal helical distortion and occur in A:T-rich regions 
are best recognized. However, the most frequently occurring 
replication errors arise from mismatches (or unpaired bases) which 
cause minimal helical distortion and occur most frequently in 
G:C-rich regions (8), which may make it difficult for enzymatic 
cleavage methods to detect some of the most commonly occurring 
mutations. 

The specificity of mismatch binding proteins involved in 
mismatch, i.e. replication error, repair in vivo should make them 
ideally suited to mutation detection. Mismatch binding proteins 
recognize best those mismatches and unpaired bases which most 
resemble base pairs and which are, therefore, most likely to occur as 
replication errors. In addition, mismatch binding proteins recognize 
mismatches best in regions of high G:C content However, the use 
of mismatch binding proteins in mutation detection has, heretofore, 
met with limited success. The results reported here indicate that 
immobilized mismatch binding protein exhibits enhanced ability to 
discriminate between DNA with and without mismatches relative to 
mismatch binding protein in solution. Thus immobilization facili- 
tates mutation detection by mismatch binding protein and represents 
a novel approach to mutation/polymorphism detection. The assay is 
simple to use, accurate and readily amenable to automation. 

MATERIALS AND METHODS 

Oligonucleotides 

Oligonucleotides of the sequence biotin-GCACCTGACTC- 
CTGXGGAGAAGTCTGCCGT were annealed to unlabeled 
complementary oligonucleotides to form all possible mismatches 
(heteroduplexes) and a G:C base pair (homoduplex). Heterodu- 
plexes were also prepared with unpaired bases by inserting the 
following bases between positions 15 and 16 of the complemen- 
tary (non-biotinylated) strand of homoduplex molecules: (i) C, 
(ii) CA, (iii) CAG, (iv) CAGG. 

Immobilized mismatch binding protein assay 

MutS (Genecheck Inc.; 500 ng/well) in reaction buffer (20 mM 
Tris-HCl, pH 7.6, 5 mM MgCl 2 , 0. 1 mM dithiothreitol, 0.01 mM 
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EDTA) was bound to nitrocellulose pre-wet with reaction buffer 
in a 48-well slot blotting apparatus (Hoeffer). Reaction buffer 
without MutS was added to control wells. After 20 min at room 
temperature nitrocellulose was blocked with 200 fxl/weil 3% 
horseradish peroxidase (HRP)-free bovine serum albumin (BSA). 
After 1 h excess blocking solution was removed under vacuum and 
DNA (1 and 10 ng) was added in 20 |il reaction buffer plus 3% 
BSA. After 30 min at room temperature wells were washed five 
times with 100 \x\ reaction buffer. All washes were poured out 
rather than removed under vacuum. Streptavidin-conjugated HRP 
( 100 u.1, 0.05 jig/ml) in reaction buffer plus BSA was added to each 
well. After 20 min at room temperature any remaining solution was 
poured out and the wells washed five times with 100 [i\ reaction 
buffer as described above. The nitrocellulose sheet was removed 
from the apparatus, washed four times with 50 ml reaction buffer, 
blotted diy, immersed in 10 ml ECL development solution 
(Amersham) for 1 min, blotted dry and exposed to X-ray film. 



Human genomic DNA 

Human genomic DNA was PCR amplified to obtain the 
following specific fragments of the human glucokinase gene. 

Exon 3. 

Het-3a. The template was human genomic DNA known to be 
heterozygous for a transition mutation (G:C— »A:T) in exon 3 of 
the glucokinase gene. The DNA was obtained from CEPH (Paris, 
France). 

Het-3b. The template was human genomic DNA known to be 
heterozygous for a trans version mutation (G:C-»C:G) in exon 3 
of the glucokinase gene. The DNA was obtained from CEPH 
(Paris, France). 

Hom-3. The template was human genomic DNA presumed to be 
homozygous in exon 3 of the glucokinase gene. The DNA was 
obtained from Sigma. 

The primers were 5'-biotin-GGCTGACACACTTCTCTCT and 
5'-GATGGAGTACATCTGGTGTT. The amplified fragment 
was 150 bp long. 

Exon 6. 

Het-6. The template was human genomic DNA known to be 
heterozygous for a transition mutation (G:C-> A:T) in exon 6 of 
the glucokinase gene. The DNA was obtained from CEPH (Paris, 
France). 

Hom-6a. The template was human genomic DNA presumed to be 
homozygous in exon 6 of the glucokinase gene. The DNA was 
obtained from Sigma. 

Hom-6b. The template was human genomic DNA known to be 
homozygous in exon 6 of the glucokinase gene. The DNA was 
obtained from CEPH (Paris, France). 

The primers were S'-biotin-CAGCTTCTGTGCTTCTTG and 
5'-TGAAGCCGTTTGTACACAG. The amplified fragment was 
187 bp long. 

Exon 2. 

Het-2. The template was human genomic DNA known to be 
heterozygous for a transversion mutation (G:C-*T:A) in exon 2 
of the glucokinase gene. The DNA was obtained from CEPH 
(Paris, France). 
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Figure 1. Binding of synthetic oligonucleotides by immobilized MutS. DNA 
(see Materials and Methods) with mismatches or a G:C base pair at position 15 . 
or with one to four unpaired bases between positions 15 and 1 6 bound by MutS 
immobilized on nitrocellulose and revealed by chemiluminescence. Data are 
from a single experiment. Exposure time was 1 min. 



Hom-2. The template was human genomic DNA presumed to be 
homozygous in exon 2 of the glucokinase gene. The DNA was 
obtained from Sigma. 

The primers were 5'-biotin-GAAGGTGATGAGACGGAT and 
5'-CCCAGGAGATTCTGTCTC. The amplified fragment was 
230 bp long. 

RESULTS AND DISCUSSION 

Mismatch binding protein was immobilized by binding to 
nitrocellulose. The mismatch binding protein utilized in the 
experiments reported here is Escherichia coli MutS, which 
operates in vivo as the mismatch recognizing component of the 
Kcoli mismatch repair system (9). Although all mismatches are 
not repaired with equal efficiency, either in vivo (10) or in vitro 
(11), MutS has been shown to bind in vitro to all mismatches and 
to heteroduplexes with one to four unpaired bases (11,12). 

The results reported here are from experiments in which MutS 
was immobilized by binding to nitrocellulose. Other solid supports, 
including nylon and PVDF membranes, have been successfully 
employed as well (results not shown). The results of experiments 
utilizing synthetic S'-biotinylated 30mers with and without mis- 
matches or unpaired bases are shown in Figure 1 . the sequence of 
the 30mers was taken from the ^-globin gene at the region 
surrounding the sickle cell anemia mutation. The mismatches are 
at position 15 and the unpaired bases are between positions 15 and 
16. Signals are generated by means of chemiluminescence. 
Immobilized mismatch binding protein readily detects all mis- 
matches except C:C, which is the one mismatch which has been 
found to be generally refractory to repair by the Ecoli mismatch 
repair system, both in vivo and in vitro (10). Heteroduplexes with 
one or two unpaired bases are readily detected. Heteroduplexes 
with three or four unpaired bases are somewhat less well detected. 
With immobilized MutS there is excellent discrimination between 
mismatched and non-mismatched oligonucleotides, the ratio of 
binding of G:T-containing to perfectly matched oligonucleotides 
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Figure 2. Binding of heteroduplex and homoduplex 30mers by immobilized 
MutS. DNA and assay conditions were as described in Figure 1 except that 
exposure time was 20 s. Data are from a single experiment. 



(i.e. the ratio of the lowest concentrations at which a signal is 
detected) is of the order of 1000:1, whereas the ratio of binding 
with MutS in solution is only -5: 1 (Fig. 2 and Table 1). 



Table 1. Binding of mismatch-containing DNA by MutS in solution 



DNA (ng) 


G:C base pair 
(c.p.m.) 


G:T mismatch 
(cp.m.) 


Ratio 

mismatch: base pair 


0.1 


80 


142 


l.S 


1.0 


242 


1252 


5.2 


10.0 


1403 


7236 


5.2 



Biotinylated oligonucleotides (described in the legend to Fig. 1) were labeled 
with 32 P by T4 polynucleotide kinase. Labeled oligonucleotides were annealed 
with unlabeled oligonucleotides as described in the legend to Figure 2 to pro- 
duce 30mers without mismatches (homoduplexes, G:Q or heteroduplexes with 
G:T mismatches at position 15. 

MutS (500 ng) was incubated in 20 ul reaction buffer (20 mM Tris-HCl, pH 7.6, 
5 mM MgCl 2 , 0. 1 mM dithiothreitol, 0.01 mM EDTA) with DNA at room tem- 
perature for 30 min. The mixtures were spotted onto 25 mm nitrocellulose filters 
pre-wet with reaction buffer. The filters were washed five times with 2 ml reac- 
tion buffer by vacuum filtration and dried at 80°C for 15 min. Each filter was 
placed in 3 ml scintillation fluid and the radioactivity determined by scintillation 
counting. Background counts (no MutS) were subtracted. The results presented 
are the means of duplicate or triplicate experiments. 

The finding that T:C mismatches are better detected than C:T 
mismatches suggests that mismatch recognition may depend on 
the sequence of the individual strands, i.e. the sequence in the 
vicinity of the mismatch and the orientation of the mismatch with 
respect to the strand, at least in relatively small oligonucleotides 
such as those used in these experiments. However, the detectable 
mismatches (all mismatches except C:C) are detectable indepen- 
dent of orientation. In addition, G:T and T:G mismatches are 
detected equally well, suggesting that well-detected mismatches 
are well detected independent of strand orientation. It cannot be 
excluded that some of the variation observed in the extent of 
binding is due to errors in the oligonucleotide synthesizing 
process. Some 30mers have been observed to give signals when 
in a homoduplex configuration (data not shown). However, these 
signals are generally weaker than the weakest signals considered 
to be indicative of mismatch-specific binding, i.e. weaker than or 
equal to the C:C signal in Figure 1. 




Figure 3. Detection of heterozygotes in the human glucokinase gene. Human 
genomic DNA from heterozygotes and homozygotes in the human glucokinase 
gene was amplified with primers specific to regions of exons 2. 3 and 6 (see 
Materials and Methods). Annealed PCR products were used in assays with 
immobilized MutS as described. Only DNA from heterozygotes should contain 
mismatches as indicated. Data from exons 3 and 6 are from a single experiment. 
Data from exon 2 are from a separate experiment. PCR mixture ( 1 00^1): 1 0 mM 
Tris-HCl, pH 8.3, 1.5 mM MgCl 2 , 0.2 mM dNTPs, 1.0 uM primers. 200 ng 
template DNA, 2.5 U Taq polymerase (Boehringer Mannheim). Thirty cycles: 
denaturation, 1 min at 94°C; annealing. 1 min. three cycles at 62°C, three 
cycles at 60°C three cycles at 58°C, three cycles at 56°C. 18 cycles at 54°C: 
extension, 2 min at 72° C Primers were removed with QIAquick spin columns 
(Qiagen). PCR products were eluted in 10 mM Tris-HCI. pH 8.0, adjusted to 
0.1 M NaCl. Denaturation was at 100°C for 3 min. Annealing was for 1 h at 
55°C, 4 min at 75°C and 30 min at 55°C. DNA was stored at 4°C until use. 
Exposure time 30 s. 



The failure to bind C:C mismatches to a significant extent does 
not diminish the utility of this method for mutation detection, since 
every wild-type/mutant pairing gives rise to two different mis- 
matches (e.g. G:G and C:C). G:G mismatches give strong signals. 

Mutations in the human glucokinase gene are responsible for 
non-insulin-dependent diabetes (13). Regions of three glucoki- 
nase exons were PCR amplified from human genomic DNA 
known to be heterozygous for mutations in those regions and 
from human genomic DNA known or presumed to be homozy- 
gous for the wild-type sequence in those regions. In each case one 
of the primers contained a 5'-biotin, allowing detection by 
chemiluminescence. The same primers were used to amplify both 
heterozygous and homozygous genomic DNAs and the ampli- 
fications were performed simultaneously. Estimates of DNA 
quantities in the PCR products were obtained by polyacrylamide 
gel electrophoresis. (The DNA quantities shown in the figures are 
approximate. For the immobilized mismatch binding protein assay 
to produce accurate results it is sufficient to establish accurate 
relative quantities for homozygote and heterozygote comparisons. 
Any method capable of accurately obtaining such relative 
quantitation of the PCR products would be equally suitable.) The 
DNAs were denatured, by heating, allowed to re-anneal and tested 
for the presence of mismatches, i.e. heterozygotes, by testing then- 
binding in an immobilized mismatch binding protein assay 
utilizing E.coli MutS. 

The results are presented in Figure 3. In each case heterozygotes 
can be clearly distinguished from homozygotes. The actual ratios 
of mismatch-containing DNA binding to mismatch-free DNA 
binding (i.e. heteroduplex binding to homoduplex binding) are 
approximately twice the apparent ratios seen in Figure 3. since the 
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Figure 4. Comparison of products of different PCR polymerases using the immobilized MutS assay. DNA was amplified from human genomic DNAs heterozygous 
or homozygous in exon 2 of the glucokinase gene (see Materials and Methods). All data are from a single experiment. PCR mixtures (lOQil): 0.25 mM dNTPs, 0.2 
uM primer l t 0.2 \lM primer 2, 200 ng template DNA. PWO polymerase (Boehringer Mannheim): 10 mM Tris-HCl, pH 8.85, 25 mM KCI, 5 mM (NrthSCU, 6 mM 
MgS04, 5 U DNA polymerase. Vent polymerase and Vent + exonuclease: 20 mM Tris-HCl, pH 8.8, 10 mM KCI, 2 mM MgSQj, 10 mM (NH2)2S0 4 , 0.1% Triton 
X-100, 2 U DNA polymerase. Thirty cycles: denaturation, I min at 94°C; annealing, 1 min, three cycles at 64°C, three cycles at 62°C, three cycles at 60°C, three 
cycles at 58°C, three cycles at 56°C, 15 cycles at 54°C; extension, 2 min at 72°C Primers were removed with QIAquick spin columns (Qiagen). PCR products were 
eluted in 10 mM Tris-HCl, pH 8.0, adjusted to 0. 1 M NaCL Denaturation was at 100°C for 2 min. Annealing was for 1 h at 55 °C, 4 min at 75 °C and 30 min at 55 °C. 
Exposure time 30 s. 



heterozygote samples were randomly annealed. Therefore, half the 
molecules will be heteroduplexes and half will be homoduplexes. 
The strength of the heterozygote signal appears to be mismatch- 
dependent. In the case of exon 3, where two different mismatch 
pairs were studied, a strong signal is observed when the 
mismatches formed are G:T and C:A (Het-3a), whereas a 
somewhat weaker signal is observed with G:G and C:C mis- 
matches (Het-3b), presumably due to the fact that only G:G 
mismatches are detected. The intermediate strength signal ob- 
served with the exon 2 fragment (Het-2) may reflect mismatch 
specificity, i.e. G:A and C:T mismatches appear to be somewhat 
less well recognized than G:T and A:C mismatches. However, the 
signal may also be somewhat lower because the molar concentra- 
tion of mismatches is lower in the exon 2 fragment experiment than 
in the exon 3 fragment experiment, i.e. equal quantities of DNA 
were used and the fragments differ in length (230 versus 150 bp, 
respectively). 

There is significantly increased binding of homoduplex DNA 
in these experiments relative to those with 30mer oligonucleo- 
tides (Fig. 2). It may be that the biotinylated primers occasionally 
initiate replication at sites other than the selected site. These 
fragments would be labeled and might be bound by immobilized 
MutS, either because they form mismatches when annealed with 
the genomic DNA from the homologous chromosome or because 
they form some secondary structure with mismatches. Alterna- 
tively, the homoduplex binding may be the result of polymerase 
errors or DNA damage occurring during amplification. Polymer- 
ase errors would be expected to occur relatively randomly 
throughout the amplified fragment, such that they would not be 



detectable by sequencing, but the cumulative effect of such errors 
could be to produce a sizable fraction of PCR products with some 
error. These would generally produce mismatches when dena- 
tured and annealed and thus contribute to positive signals in the 
immobilized mismatch binding protein assay. However, when the 
exon 2 fragment is amplified by four different polymerases, some 
of which have increased fidelity of replication and should, 
therefore, have a reduced rate of production of error-containing 
fragments, the ratio of heteroduplex to homoduplex binding does 
not change significantly (Figs 3 and 4). 

The results presented here are concerned only with the detection 
of heterozygous mutations. The detection of homozygous mutations 
can easily be accomplished by adding known homozygous DNA to 
the test DNA before denaturation and annealing, either before or 
after amplification. Thus the use of immobilized mismatch binding 
protein assays for mismatch, mutation, heterozygosity or poly- 
morphism detection involving single base substitutions and small 
additions or deletions seems to be limited only by the need to provide 
substrates free of labeled DNA with random mismatches, as 
discussed above. Immobilized mismatch binding protein provides a 
simple, accurate and easy to automate system for the following. 

(i) Diagnostic screening for any disease causing mutation (or 
mutations), including single base substitutions and small additions 
or deletions, for which the sequence and location of the mutation(s) 
are known. It is possible to detect both carriers (heterozygotes) and 
affected patients (homozygotes) and to distinguish between them. 

(ii) Rapid and large scale screening of human (or other) 
genomic DNA for single base change or small addition/deletion 
polymorphisms. The ease and speed of the system make it 
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possible to screen large numbers of individuals and to construct 
high resolution maps based on genomic polymorphism. 

In addition, it may be possible to use immobilized mismatch 
binding protein to remove error-containing molecules from PCR 
samples, to bind heterozygous sequences to allow determination 
of identity by descent and to study closely related varieties and/or 
species to characterize biodiversity. 
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Specific nucleic acid sequences such as rRNAs t RN A s , mRNAs , 
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quences whose mRNAs are not abundant are very difficult to isolate. 

One promising alternative to the use of isolated naturally 
occurring nucleic acid probes, is the use of chemically synthe- 
sized o 1 i godeoxy r i bonuc 1 eo t i de sequences. Recently, a specific 
13 nucleotide sequence complementary to the gene for yeast iso- 
1-cytochrome c was used to detect and isolate the gene sequence 
cloned in a bacteriophage X vector (7). This was possible be- 
cause the actual nucleotide sequence of a region of the gene 
could be deduced from gene t i c ' i n f o rma t 1 on . As a more general 
approach to the use of o 1 i godeoxy r r bonuc I eot i desj a s probes, we 
propose to use a chemically synthesized mi.xture of oligonucleo- 
tides whose sequences represent all possible codon combinations 
predicted from a particular peptide sequence within a protein. 
One of this mixture must be complementary to a region of DNA 
coding for the protein. Stringent hybridization criteria would 
then be used to select the single correct sequence from the mix- 
ture. As a preliminary investigation, we have chosen to study 
the hybridization behavior of three oligonucleotides, 11, 14, 
and 17 bases long, to DNA from wild type (wt) and am-3 bacterio- 
phage I 7 A . The three oligonucleotide sequences are comp temen; 
tary to wt DNA at the region encompassing the am-3 point muta- 
tion (8). Duplexes formed between the oligonucleotides and am-3 
DNA contain a single mismatched base pair. This system repre- 
sents a useful model for the study of the effect of mismatched 
base pairs on duplex formation and stability. 

MATERIALS AND METHODS ' 

Synthesis of 0 1 i godeoxy r i bonuc 1 eot i des 

The ol i godeoxy r i bonucl eot i des were synthesized by the modi- 
fied triester method (9). Their use in synthetic DNA directed 
base change of ^x^** DNA has been described previously (10). 
The oligonucleotides were gifts of Genentech, Inc., San Francisco. 
Preparation of Phage DNA 

The ^x^^ w * and am-3 DNAs were gifts from Dr. Aharon Razin. 

The DNAs were isolated from purified phage as described (II). 

Labeling of Oligonucleotides 

The synthesis of oligonucleotides leaved 5'0H. The oligo- 

32 32 

nucleotides are labeled by transferring the J P from [y P]ATP 
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(1000 Ci/mmole) with polynucleotide kinase (Boehringer Mannheim) 
as described (12). Oligonucleotide (0.2 ug) was labeled and 
separated from [y- 32 P]ATP by chromatography on Sephadex G-50. 
The excluded peak was pooled and used directly in hybridization 
experiments. 

Preparation and Hybridization of DNA filters 

The DNA was incubated in 0.2 N NaOH at 37°C for 30 minutes, 
neutralized with HCl and brought to 6 X SSC at 0°C. Nitrocellu- 
lose filters (Sartorious Memb ranf i 1 t e r , 2.5 cm, pore size 0.45 
u) were wet in H 2 0, washed in 6 X SSC (1 X SSC = 0.15 M NaCl, 
0.015 M sodium citrate, pH 7-2) and placed on a filter appara- 
tus. The amount of DNA loaded on the filters depended on the 
nature of the experiment. For the thermal denaturation experi- 
ments, 0.05 ug of wt and am-3 DNA were applied to each filter. 
For kinetic experiments, 0.005 Ug wt DNA and 0.025 ug am-3 DNA 
were applied to each of ten filters. After application of DNA, 
the filters were baked at 80°C in vacuo for k hours. 

For hybridization, the filters were placed in 6 X SSC, 10 X 
Denhardt's [10 X Denhardt's = 0.2% bovine serum albumin (Sigma), 
0.2% polyvinylpyro! idone (Sigma), 0.2% Ficoll (Sigma) (13)1 at 
room temperature for 15 minutes. The solution was drained and 
replaced with 2 mi of hybridization solution (6 X SSC, 10 X Den- 
hardt's, 0.002 ug/ml 32 P labeled oligonucleotide). Unless 
otherwise stated, hybridization was performed at 12°C for 16 
hours. The filters were then washed. with multiple changes of 6 
X SSC at 12°C, until no more radioactivity eluted. 
Thermal Denaturation 

Filters which had been hybridized and thoroughly washed as 
described above were used for thermal denaturation studies as 
follows: 5 ml of 6 X SSC was placed over the filter and the 
temperature raised to a specific point. Once the temperature 
had been reached, the filter was kept at that temperature for 
one minute, the 6 X SSC was then removed for measurement of the 
radioactivity eluted (in Aquasol 2, New England Nuclear). An 
additional 5 ml of 6 X SSC was then added and the procedure re- 
peated until the desired maximum temperature was reached. 

The radioactivity eluted at each temperature was integrated 
to determine the fraction of the duplex denatured as a function 
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of temperature. In order to determine the temperature at which 
one half of the oligonucleotide dissociates from the filter (T^), 
this data was fit to the error function by a nonlinear least 
squares fitting program as described previously (14,15). 

In each experiment, essentially all of the bound oligo- 
nucleotide was removed by the highest temperature wash. In add- 
ition, none of the $x17*f DNA bound to the filter was lost during 
the procedure, since an equal amount of ^P-labeled oligonucleo- 
tide will hybridize to the same filter in a second or third ex- 
periment. 

Agarose Gel Electrophoresis 

Vertical agarose gels (SeaKem) were used to separate DNA 
samples for transfer to nitrocellulose. The gels were 15 cm X 
15 cm X 0.2 cm and were e I ec t r opho re s ed at 200 volts for 2 hours. 
Undigested DNA was separated on \% agarose gels while Hae ill 
digested *x ' 7 ^ am-3 R F I DNA (Bethesda Research Laboratories) was 
separated on 2% agarose gels. After electrophoresis, gels were 
stained with 0.1 yg/ml ethidium bromide (Calbiochem) for 30 min- 
utes and photographed through an orange filter under an ultra- 
violet light source. The DNA was denatured by" soaking the gel 
in Q.k N NaOH, 0.8 M NaCl for 30 minutes. The gel was neutral- 
ized in 0.5 M tris-HCl, pH 7.6, 1.5 M NaCl for 30 minutes in the 
cold and the DNA transferred to nitrocellulose filters (MMIi- 
pore HAWP 00010) as described by Southern (16). The filters 
were baked at 80°C in vacuo for at least k hours. 

The filters were hybridized in 6 X SSC, 10 X ' Den ha r d t ' s , 
0.002yg/ml ^P-labeledoligonucleotide at the temperatures 
specified in each experiment. After hybridization the filters 
were washed in 500 ml 6 X SSC at 12°C, blotted dry and autorad- 
iographed using pre-flashed Kodak XR-2 X-ray film exposed be- 
tween 2 intensifier screens (Cronex Lightning Plus, Oupont) at 
-80°C for 12-36 hours. 

RESULTS 

Thermal Stability of Oligonucleotide - <&x!7fr DNA Duplexes 

In order to study the hybridization of synthetic oligode- 
oxy r i bon uc 1 eo t t des to natural DNA, we synthesized three oligo- 
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nucleotides of chain length 11, 14, and 17 which are complemen- 
tary to the single stranded DNA (+ strand) of the wild type (wt) 
bacteriophage Ox ^ 7 ^ • 

The 11 nter and U uter are synthetic intermediated of the 17 
mer. The 17 mer is complementary to nucleotides 575 through 591 
in the linear sequence of *x!74 DNA reported by Sanger and co- 
workers. (8) (Figure 1). These sequences represent useful models 
for the study of single base pair mismatch since duplexes formed 
with am-3 <^X ' 7 ^ DNA contain one A-C base pair [amber mutation is 

a G+A transition at position 587 of the DNA sequence (Figure I)]. 

3 2 

For hybridization of P-labeled oligonucleotides to wt and 
am-3 *X 1 7 ^ DNA, the phage DNA was immobilized on nitrocellulose 
filters. Initially, hybridizations were performed at 12°C in 6 
X SSC, 10 X Denhardt's ([Na + ] = 1.2 M) (see Materials and Meth- 
ods). From previously published results (17-20), the duplexes 
formed between the II mer, 14 mer or 17 mer and wt DNA were 
expected to be rather stable with Tin's greater than 30°C. Under 
the conditions of the hybridization or the subsequent washing of 
the filters (in 6 X SSC) , the oligonucleotides do not adhere 



3'AACa' X CCTATG5' 



/ \ 

3AACA CC7ATGGGA5 Mm*, 
C 

3"AACA CCTATGGGAGCG5* wm« 
-CGCTGGACTTTGT GGATACCCTCGCTTT 



pX !7J /fttcryfl* 



jmj \7a one. 




Base Pairs 

— U42 

— 1.078 

— 872 



310 

278.271 

234 

194 



FIGURE 1, A representation of the mismatched duplexes formed 
between the three oligonucleotides and am-3 $xMh DNA. On the 
right, am-3 <*x174 RFI DNA, which was digested with Hae III, was 
elect rophoresed on a 2% agarose gel and blotted onto nitrocellu- 
lose as described in Materials and Methods. The filter was hy- 
bridized to 32 P-labeled 14 mer at 12°C, washed at I2°C and auto- 
radiographed. It can be seen that hybridization is to the 234 
base pair long restriction fragment which contains the am-3 
mutation at nucleotide 587 (8). 
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nonspeci f i cal ly to the nitrocellulose (see Figure 1). 

The hybridization of all 3 oligonucleotides to wt $\}7k DNA 
was quite efficient. Between 13 and 22% of the sites on the 
phage DNA molecules hybridize with the labeled oligomers (Table- 
1). The stability of .the o I i gon uc 1 eo t i de-w t DMA duplexes were 
examined by thermal denaturat ion. Filters which had been hy- 
bridized and washed at I2°C were heated to various temperatures 
in 6 X SSC and the radioactivity which eluted was measured. The 
thermal denaturation profiles are presented in Figure 2. The 
data is summarized in Tabie 1. Note that the parameter T the 




-1 ' 1 > 1 ' 1 ' 1 r 

10 30 50 70 90 
TEMPERATURE (°C) 



FIGURE 2, Thermal denaturation of o 1 i gon uc 1 eo t i de-w i 1 d type 
•XI74 DNA duplexes. The II nucleotide (•), the 14 nucleotide 

. * an .° ^nucleotide (X) probes shown in Figure 1 were labeled 
with I PJ in the 5' end and hybridized to wild type <&y 1 "] k DNA 
immobilized on nitrocellulose filters. The hybridization was 
performed in 6 X SSC, 10 X Denhardfs (13) and 0.b<)2 ug probe/ml 
irie filters were washed in 6 X SSC and subjected to thermal de- 
naturation. The radioactivity eluted at each temperature was 
measured and is plotted as the fraction of the total probe be- 
coming single stranded at each temperature. 
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Table 1. 



* , ? ,, t ,d type 

l."!'* labeled 
*K *X'7« DNA 

on was 
*~ J* 9 Probe/ml. 
^: thermal de- 

^P'obe be- 



ll 



i 



Number 
Nucleotides 
In Probe 


*X'74 ONA 
Hybridized 


Number 
Nucleot ides 
In Duplex 


% Sites 
Hybridized 


1 ! 


wt 


1 1 


20 


1 I 


am- 3 


10 




14 


25 


14 


13.6 


14 


am-3 


13 


2.9 


17 


wt 


17 


22.6 


17 


am-3 


16 


M.3 



% G+C 



T d a Observed 1 * 



(C°) 



46 

50 
^3 
59 
53 



33.2 



0.4 



40.6 ± 0.7 
3 I . I ± 0.8 
55.1 ± 0.8 
43.5 ± 0.3 



temperature at which one half of the dupie.es are dissociated 

"sed rather than T» since the experi.ent does not a . , ow di- 
rect measures of T m in a t he rmody nam i ca .1 y rigorous way. As 
expected, an increase in thermal stability is seen with an in- 
crease in duplex length. 

Compared to the wt ONA. hybridization of the 3 o.igonucleo- 
t-des to am-3 ONA is much .ess efficient (Table I). ,„ fact 
the ,e ve , of hybridization of the .1 mer to am- 3 ONA was barely 
above background and determination of an accurate J. was not 
Possib.e. The thermal denaturation profiles of the o I i gonuc I eo- 
t-de-am-3 DNA duplexes are presented in Figure 3. Tne oI , go _ 
nuc 1 eo t i de-wt melts are plotted for comparison. The data is 

summarized in Table .. It can be seen that the therma, stabi- 

•ty of the 14 mer and ,7 mer dup.exes „ it h am- 3 ONA is much 
'°wer than that of the corresponding wt duplexes. 

The substantia, difference in the thermai stabi.ity of per- 
fect y matched and mismatch dup.exes suggests that hybridization 
of the ongonuc.eotides to am- 3 ONA cou.d be eliminated with 

'tt.e, ,f any, effect on hybridization to wt DNA by the appro- 
priate choice of either 

or e.ther f.lter wash temperature or hybridization 
temperature. To test this prediction, wt and am- 3 ONAs were 
• ectrophoresed in adjacent lanes of a agarose ge.. The ONA 
the ge. was transferred to n i t roce u . ose essentially as de- 
served by Southern (.6). The filter strips were hybrid, zed at 
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Temperature (°C) 

■ FiGURE 3, Thermal denaturat ion of ol tgonucleot ide-wi Id type 
1 DNA (■) and o 1 i gonuc 1 eot i de-am-3 $xl7** DNA (•) duplexes. 
The hybridization and thermal denaturatron were performed as in 
.Figure 2. Left U-mer and right 17-mer/ 

o 3 2 1 

1 2°C with P- 1 a be ! ed II mer , 1 k mer or I 7 me r , washed a t 1 2° C 

and au torad iographed . Figure ka shows the au to rad i og rap h ob- 
tained. Hybridization to wt ONA is approximately equal for all 
3 oligonucleotides." The level of hybridization to am-3 ONA, on 
the other hand; is dependent on the length of the oligonucleo- 
tide. Very little hybridization to am-3 DNA is seen for the II 
mer, a slightly greater amount for the \k mer, while hybridiza- 
tion of the 17 mer to am-3 ONA' approaches that to wt DNA (see 
Table 1 for comparison). After autoradiography, the filter 
strips were rewashed at higher temperatures, 30°C for the M mer, 
37°C for the \k mer, and 50°C for the 17 mer. The a ut o ra d i og raph 
of the rewashed filters is shown in Figure H . It can be seen 
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FirilRP U Effect of filter wash temperature on hybridization 
o ^-labeled oligonucleotides to wt and an-3 W* ^A. Equal 
founts of single stranded wt (on the left and a™- 0 HA on he 

^?oLn-o:i e f?;re;r:iie:cr!Ld 9 ^ 0 ^trr!:.r^s ,o M ^ 

DNA, were hybridized with P labeled m me au i ora dio- 

«. io°r Tk« f ; lhprq were washed at JZ C ana auioraoiu 
^Ipn d'ov^ni ^I;;H e p renashed X-ray '"^^V??,".^ 
-80 o C (A). The filters were then rewashed at 30 C U mer '' 
C (U-mer) and 50 9 C (l/-»er) and re-au t orad , og raphed (B). 



at 
37' 



that hybridization to the a*-3 DNA is virtually eliminated in 
each case while hybridization to wt DNA is only diminished 
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s 1 ightly . 

Oligonucleotide Hybridization at Elevated Temperatures 

In order to examine the effect of hybridization temperature 
on the formation of non-mismatched or mismatched duplexes, wt 
and am-3 DNAs were e I ect rophoresed in alternate lanes of a \% 
agarose gel and the DNA transferred to a nitrocellulose filter. 
The filter was cut into strips containing one band of wt and 
one of am-3 DNA each. The strips were hybridized in. 6 X SSC, 
10 X Denhardt's, 0.002 ug 32 P-labeled 1 k mer/ml at 12°C, 25°C, 
30°C, 35°C > and ^O 0 *;, washed at I 2 ° C , and autorad i ographed . The 
results are shown in Figure 5- It can be seen that hybridiza- 
tion to am-3 DNA is dramatically reduced at 25°C and higher. 
Hybridization to wt DNA is not affected between 25°C and 35°C 
with only a slight decrease in hybridization at ^0°C (l°C below 




12 ( 



25* 



30* 



35 c 



40* 



FIGURE 5i Effect of hybridization temperature on the formation 
of duplexes between 32 P-labeled 1^-mer and wt (on the left) and 
am-3 DNA (on the right). Equal amounts of wt and am-3 DNAs were 
electrophoresed in alternate lanes of a I % agarose gel and blot- 
ted onto nitrocellulose filters. The filters were cut into 
strips and^hybridized at various temperatures (12°, 25°, 30°, 
35 and 40°C) as described in Materials and Methods. The strips 
were then removed from the hybridization solution, washed brief- 
ly at 12°C and autorad iographed . 
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ABSTRACT Each of four possible sets of mismatches 
(G-A/OT, OC/G-G, A-A/T-T, and C-A/G-T) containing the 
8 possible smgie-base-pair mismatches derived from isolated 
mutations were examined to test the ability of T4 endonuclease 
VII to consistently detect mismatches in heteroduplexes. At 
least two examples of each set of mismatches were studied for 
cleavage in the complementary pairs of heteroduplexes 
formed between normal and mutant DNA. Four deletion 
mutations were also included in this study. The various 
PCR-derived products used in the formation of heterodu- 
plexes ranged from 133 to 1502 bp. At least one example of 
each set showed cleavage of at least one strand containing a 
mismatch. Cleavage of at least one strand of the pairs of 
heteroduplexes occurred in 17 of the 18 known single-base- 
pair mutations tested, with an A A/T T set not being cleaved 
in any mismatched strand. We propose that this method may 
be effective in detecting and positioning almost all mutational 
changes when DNA is screened for mutations. 

The detection of mutations is important, particularly in the 
diagnosis of inherited diseases. Changes in the DNA sequences 
of a gene can be harmful and it is important in our under- 
standing of human genetics that we are able to identify and 
classify these alterations and the phenotypic changes that they 
induce. Consequently, the need for a reliable method for the 
detection of mutations in DNA to avoid repetitive sequencing 
of kilobase lengths of DNA has led to the development of a 
number of different screening methods that have both positive 
and negative attributes (see ref. 1 for a review of current 
mutation detection methods). Thus, the search for a reliable 
and efficient approach to the detection of known and unknown 
mutations continues. 

The resolvases are an important group of enzymes that are 
responsible for catalyzing the resolution of branched DNA 
intermediates that form during genetic recombination. Their 
mode of action is directed by bends, kinks, or DNA deviations. 
These enzymes have their effect close to the actual site of DNA 
distortion (2). T4 endonuclease VII, the product of gene 49 of 
the bacteriophage T4 (2), is a resolvase that has been well 
characterized (3-5). It was the first enzyme shown to resolve 
Holliday structures (2). It has also been shown to recognize 
cruciforms (2, 3) and loops (6). It may also be involved in very 
short patch repair (5). Its cleavage characteristics involve it 
cleaving 3' and within 6 nt from the point of DNA perturba- 
tion—causing double-stranded breakage (2, 5). T4 endonu- 
clease VII has been shown to cleave single-base-pair mis- 
matches in model experiments with synthetic oligonucleotides 
up to -»43 bp (7). This work examines the ability of the enzyme 
to detect mutations rather than its ability to cleave specific 
mismatches. Thus, when mutant and wild-type homoduplexes 
that differ by a single base pair are melted and hybridized, two 

The publication costs of this article were defrayed in pan by page charge 
payment. This article must therefore be hereby marked "advertisement" in 
accordance with 18 U.S.C §1734 solely to indicate this fact. 
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heteroduplex species are formed containing two pairs of 
mismatched bases. The mutation would be detected if any one 
strand containing one of the four mismatched bases were 
cleaved. There are four classes of pairs of mismatched bases 
(type 1, G-A/T-C; type 2, G-T/A-C; type 3, GC/G-G; type 4, 
T-T/A-A) and at least two members of each were tested. Only 
one example showed no cleavage in any of the strands. 

MATERIALS AND METHODS 

Enzyme and Buffers. T4 endonuclease VII was prepared 
from an overexpressing Escherichia coli K38 transformant 
containing gene 49 of T4 phage as described by Kosak and 
Kemper (7). Stock solutions were at 3700 units//utl as deter- 
mined by Kosak and Kemper (7), where 1 unit is defined as 
that amount of enzyme that catalyzes degradation of 50% of 
very fast sedimenting DNA. The reaction buffer used in the 
assay was prepared as a 10 x concentrate (7). The enzyme 
dilution buffer was prepared as described (7). Assay conditions 
required between 250 and 3000 units of T4 endonuclease VII 
depending on the specific DNA being tested. The annealing 
buffer was prepared as a 2X concentrate (1.2 M NaCl/12 mM 
Tris-HCi, pH 7.5/14 mM MgCl 2 ) as described (8). The kinase 
buffer, lx TE (pH 8.0), and the formamide/urea loading dye 
were prepared as described (9). 

DNA Preparation. The DNA used in these experiments was 
amplified by PCR from genomic DNA [/3-gIobin, phenylala- 
nine hydroxylase (PAH), ^-antitrypsin], plasmid DNA (21- 
hydroxylase and the mouse mottled Menkes gene), or cDNA 
[pyruvate dehydrogenase Ela subunit (PDH Ela), dihydro- 
pteridine reductase, and the rhodopsin gene]. Each region 
contained an example of a known mutation except for the 
mouse mottled Menkes gene and dihydropteridine reductase 
mutations, which were previously unpublished. 

Each DNA sample was prepared by PCR amplification. The 
)3-gIobin gene (M8, mutation at nt 26 exon 2; M14, mutation 
at nt -87; M16, sickle mutation; M21, mutation at nt 17 exon 
1) was amplified by using primers a and b (10). A larger 
segment of the j3-g!obin gene (M15)j which contains mutations 
at nt 745, 16, 74, 81, and 666 within IVSII, was amplified by 
using primers c and d as described (10). The a r antitrypsin 
gene (M4, mutation at nt 9989) was amplified as described 
(11). The PDH Ela gene (Ml, F205L; M19, K387fs; M20, 
S312fs) was amplified by using primers PDH-P and PDH-E as 
described (12). The PAH gene M5 (homozygous mutation at 
IVS12 nt 1), M6 (heterozygous mutation at IVS12 nt 1), and 
M7 (R408W) was PCR amplified by using primers A and B as 
described (13). The PAH gene (M13, F39L exon 2) was PCR 
amplified by using the primers 5 '-d(GCA TCTTAT CCT GTA 
GGA AA)-3' and 5'-d(AGT ACT GAC CTC AAA TAA 
GC)-3\ The PCR conditions were 105 s at 95°C, 150 s at 58°C, 
and 3 min at 72°C for 35 cycles. The 340-bp section of the 

Abbreviations: PAH, phenylalanine hydroxylase; PDH, pyruvate de- 
hydrogenase; EMC, enzyme mismatch cleavage; CCM, chemical cleav- 
age of mismatch. 
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Table 1. 

Type 

1 
1 
2 
2 
2 
2 
2 
2 
2 

• 2 
2 
2 
2 
3 
3 

'3 
3 
1 
2 
2 
4 
4 

Dei 
Del 
Del 
Del 

Sou 



Biochemistry: Youil et at 

Summary of mutations tested for by EMC 
Base 

Sample 



Ml (Horn) 
M2* (Horn) 

M3 (Horn) 
M4 (Horn) 
M5 (Horn) 
M6 (Het) 
M7 (Het) 
M8 (Horn) 
M9 (Het) 
M10» (Het) 
Mill (Horn) 
M12> (Horn) 
M13 (Het) 
M14 (Horn) 
M15* (Het) 



M16 (Het) 
M17 (Horn) 
M18" (Horn) 
M19 (Horn) 
M20 (Horn) 
M21 (Het) 

of r 



Base 
change 

C -* A 
A-»C 
• T-*C§ 
T-C§ 
G — » A 
G -* A§ 
G -» A§ 
C-+T 
C -*T 
A-G 
T-+C 
A -» G 
C-T 
C-*G 
C-^G 
C-*G 

G-*T 
C-T 
T-^C 
A-*T 
T -> A 
Del 33 bp 
Del 2 bp 
Del 7 bp 
Del 1 bp 



Sequence 
context 



ATT£GAA 

CCCAATC 

CACITGC 

CACITGC 

GAC&AGA 

ACA£TAA 

ACA&TAA 

TACCTCG 

ACCCAGA 

CCAATGC 

AGCICTT 

TGGAGGA 

GAT£ATT 

CTT£TCA 

CAC£CTA 

CAG£TAC 

GAC£CTT 

GGA£AAG 

TAA£AGG 

TATJTCT 

CTGAGGA 

TCAICTG 

TAG-AGG 

TTT-GTC 

GGA-AGT 

CTG-GGA 
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Nonspecific 
cleavage* 



+ 
+ 
+ 
+ 





Mismatch 


Total fragment 


ivii^niaicri set 


detected 


length, bp 


A-G/T-C 


CT 


797 ■ 


A-G/T-C 


A-G/T-C 


340 


G-T/OA 


ND 11 




GT/C-A 


G-T 


178 


GT/C-A 


OA 


220 


GT/C-A 


GT/OA 


245 


GT/C-A 


GT/C-A 


245 


GT/C-A 


G-T/OA 


245 


GT/C-A 


G-T 


627 


G-T/OA 


7 


1300 


GT/OA 


GT 


779 


G-T/C-A 


9 


1502 


G-T/C-A 




1502 


OC/G-G 


OC** 


133 


OC/G-G 


OC 


627 


OC/G-G 


C-C/G-G 


1377 


C-C/G-G 


OC 


1377 


A-G/T-C 


A-G/T-C 


1377 


GT/C-A 


G-T/C-A 


1377 


GT/C-A 


GT/C-A 


1377 


A-A/TT 


ND 


627 


A-A/TT 


A-A/TT 


204 


33-bp loops 




1502 


AA/TT loops 


TT loop** 


797 


7-bp loops 


7 


797 


A/T loops 


ND 


■ 627 



5i 
l 



involved are given in the text. SeqlenceTontext of the sense xrTnT.TJ"'?™! ?°T Zyg °/| S (H ° m) ° r hete ™ygous (Het). Details of genes 
base change. ND, no cleavage was seen in any one of the four Zn/J T ■ * ^ V" derlined base denotes nucleotid e i» v °'^ >" the 
but the strand cleaved could not be deterged " Del detetfon ? ?i' ng J l ?" smatclled , base = ? ' cleava g e *« observed at the mismatches 
tSubstamial cleavaee wi« in t,„„^ ■ ' uel ', cleletl0n ( ba ses given flank the mutation). 

iSSt ftTXnVZ ^rSffiffi t ™f Ie f We " " heter ° dU P^ but the «« P-«- "» no« been determined. 



. s- "=<■ in ine tragment studied. 

SExactly the same mutation has been tested for the respective genes 
^This mutation at nt 118 was not detected when in fh,. t,~..~f 

that did not include the mutation at nt 138 (M3) P ° f 3 SeC ° nd mUta "° n at nt 138 ' However - il was detected * Sorter fragment 

"These mutations were previously unknown 

"Postdigestion end-labeling was performed on these fragments also. 



21-hydroxylase B gene (M2, mutations at nt 118 and 138) was 
^P' lf l e ^ b y A u f"8 the P»mers 5'-d(CTG CTG TGG AAC 
TGG TGG AA)-3' and 5'-d(ACA GGT AAG TGG CTC 
AGG TC)-3'. The 178-bp section of the 21-hydroxylase B gene 
S~ " ™ "*> am P»«ed by using the primers 
5 -d(GCT CTT GAG CTA TAA GTG G)-3' and 5'-oYGGG 

I " d i CTG CAC AGC GGC CTG CTG AA)-3' and 5'-dfCAG 

SSi A ?? A ^ AG ° ^ GG) " 3 ' ™ e PCR c°ndit?o ( n s A o G 
the 21-hydroxylase A and B genes were 105 s at 95°C 150 s at 

?im a r n 7i m / n ?JZ C - The dih y dr °Pt«idine reductase gene 
(M10, L74P at nt 245) was amplified by using the primers GD 

, J 5 »\, nt ^ was am P 1,fle d as described (15). The mouse 
mottled Menkes gene (Mil, mutation at nt 3662; M12 m" 

£3£ M) 675 M18, mutation at nt 4516) was amp,if * d " s 

(Boehringer UannnTim). Z * £ * \ ^ ?T 

DNAwasethanolprecipLtedand theS treatment - the 
times with 70% etonolTwS a^tte ^ WaShed - three , 
unincorporated label. The peUel vZ ilull ?°T n ° f 

^-f-^wcicroduplex -formation' was nprfnrm^ • ^ 

^mmmmm^^^^^ g buffer as describe <i (8) 



except that the annealing temperature was at 65°C for 1 hr 
followed by 20 min at room temperature. Calculations of DNA 
concentration were based on 50-60 ng of unlabeled DNA 
(10X excess) and 5 ng of end-labeled DNA per single reaction. 
Heteroduplexes were prepared in bulk in a 50-^1 volume and 
the pellet was resuspended in the appropriate volume of 
distilled water. For example, if six reactions were required then 
300 ng of mutant DNA and 30 ng of labeled wild-type DNA 
was used. After the heteroduplex reaction, the pellet was 
resuspended in 30 M l of distilled water (i.e., 5 fi\ of distilled 
water is taken per single reaction). An identical procedure was 
performed in order to prepare labeled homoduplex DNA for 
the control studies except that an excess amount of unlabeled 
wild-type DNA was hybridized with the labeled wild-type 
DNA. This strategy allows two of the four mismatched bases 
(those present in the labeled strand) to be tested for cleavage 
by the enzyme. 

Enzyme Mismatch Cleavage (EMC). Five microliters of the 
labeled homoduplex or heteroduplex DNA (50-60 ng) was 
added to 39 fx] of distilled water and 5 /xl 'of 10 x reaction * 
buffer, all kept on ice. The reaction was initiated by the 
addition of 1 M l of the enzyme (100-3000 units/jxl as speci- 
fied). The stock solution of enzyme was diluted to the required 
activity in the enzyme dilution buffer. After addition of the 
enzyme, the tubes were spun briefly and incubated at 37°C for 
1 hr unless otherwise specified. In the case of controls the 
enzyme was replaced with 1 M l of the enzyme dilution buffer 
and these were incubated. After incubation, the samples were 
ethanol precipitated, washed in 70% ethanol, dried briefly, and 
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thoroughly resuspended by vortex mixino in < i * r 
amide/urea loading dve The ? „i ? n 5 mJ of form - 
100°C and tan^^ hea / ed . t0 
sequencing eel Cleav/oe nr ^ I* . Urea / ac rylamide 

, dio q g raphy 8 a g nd Sefo? S^SXS* » T"' 
I radiolabeled *X174 /fc e III s^e marker ^ Wkh 

f Postdigestion End-Labelins HetemH.^u^ 

a 1:10 dilution of frest MHSI.'T.'" 1 ' of 
at37»C), theen^ e ^deiS a tfe m ™S t,0 - n (45 T n 
reaction mixture was ethano I predjSS Th^'V"'' the 
washed three times in 70% ,l„!i V 7: . The P elIet wa! * 
pended i„ 5 jTiS^S&X * "* 

was performed as described (8) er ° dUplexes had fo ™ed and 
RESULTS 

tions possible ^S^SSSS^toS^ C ° mbina " 
tant and wild-type genes ^^^J^TS^Z' 

single-base chanees Hpt^roH,,^^ i r e ^ possible 

studied unless otherwfse stated ° CCnmd m the ten *«> 

Type 1 (Mismatch Set GA/TO M1 ™„* • 
gous C - A mutation 87 bp awy £n to 5°t Z fh h ° m0Zy - 
of the gene studied rrcnsi \ il- , , of Resection 
probe, fhe li^lSS^o^^.^ 
mismatches. The asterisk desimah^tw a u T and A-G 
the labeled strand. Affer CCM only ^e 87 " fa 
observed on denaturing zcrvtemrtJ a* * P J T ? gmsnt was 
modifies only the udS^SS^^^^^ 
band slightly larger than the «7 hi * tnet MC, only a single 
suggesting cleavafen ar th wX*\^u™ ?^ 
Type 2 (Mismatch Set St/A™Q A 2^ ^ ( !' !) - 
P ^ g e-(M5)wasam P l^ 

homozygous for a G - A mutation (IVS12 nt if Th 
mutation occurred 191 bp from the 5' enrt n^ V^ )- T h,s 
End-labeled normal DNA was hvhrvr j TP product, 
mutant DNA. After enzyme deaval twTh w'* Unlabeled 
larger than the 191-bp band an 2&h« a SS?' S " g !? tly 
the 54-bp band were ob Jrv»w / Ilghtly lar S er than 

the a- £ th.*oy„^^^^n-r 
mismatch, respectively (Fig. 2). the A * C 

Type 3 (Mismatch Set C-C/G-Gt THpPau * . , 
used here involves a htt^^^^r^^^ (M13) 
in exon 2, 73 bp from thcT^Z m hT^ 57 
gene studied Heterodunlev • u 33 " bp sect,on of th * 

Li and h U-W^ff D ^ du ^^ d G ^ 
mismatches as well as homoduotaw 1 J nd 0-0 
normal and mutant DNA CcKin^^ T ndmg t0 the 
detection of only the C'<£dSm SWT 11 * a "° Wed 




froc Mwi ^efli sa viiW* 

fr. r «,d« e ,.no« S p£i,i c Sa s = ""»"''='<•'>. »f a 

the cleavage will NOT be obZlll/l ' the J Ta S^^ts 3' to 



HA ECM 

i r— 

13A c 



13A 



m 



*-797 bp 




Ft Autoradiograph of CCM and EMC analysis of the PDH 

Ela gene (Ml) containing a homozygous C -» A mutZil i i 1 P? 
samples of control homoduplex DNA (Q after ^nrnK ? ~il 

lines, mutant DNA strands- an-mJ .11 i strands; straight 

the end-labeled strand fc i eS^Sf" ' f^™, cleaVa S e on 
actual cleavage site o?L CCM reacZ c P 1 # '- ' abel; arrow "ead, 
sizes observed by EMC bein* Su Pl fscr, P t a re f« to band 

sizes as determined by CCM tSSS Tndil" eXp ! Cted band 
/foe IH. 7 m- 1 " anes M > end-labeled marker 0X174 
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mid DNA. This mutant DNA contained a homozygous T -> A 
mutation at base 1004 of the gene, 110 bp from the 5' end of 
the section of the gene studied. This results in a heteroduplex 
species containing both A-A* and T**T mismatches. EMC on 
this sample with end-labeied probe showed two bands: one 
slightly greater than 110 bp and the other slightly greater than 
94 bp (Fig. 4). This confirms that the enzyme recognized both 
strands of the probe. The intensity of the products on auto- 
radiography shows that the enzyme recognizes the A-A* 
mismatch with greater efficiency than the T*-T mismatch. 
CCM on this sample showed only the 110-bp fragment ob- 
tained after modification and cleavage of the mismatched T 
base in the sense strand of the probe (Fig. 4). 

DISCUSSION 

A total of 3 type 1, 13 type 2, 4 type 3, 2 type 4, and 4 deletion 
mutations have been tested by the enzyme cleavage method in 
this study. Four of the single-base-pair mutations detected 
were previously unknown. Of the 18 known single-base mu- 
tational changes tested, only 1 (an A -*T change) did not show 
cleavage of any strand of the two heteroduplexes (Table 1). 
This may be due to a sequence context feature since of the four 
single loops tested (generated by deletions) only one did not 
show cleavage of any strand, and that one involved the same 
base as the single-base mutational change A -> T present in 
M16. We would like to investigate the reasons for nondetection 



*-245 bp 



«-19t 



HA 



ECM 




191 1 
5'*. Q* 



54 



-•5* 



Fig. 2. Autoradiograph of EMC analysis of the PAH gene <M5) 
containing a homozygous G -* A mutation. Lanes: 1-4, samples of 
control homodupiex DNA (C) after incubation with 0, 250, 500, and 
1000 units of T4 endonuclease VII; 5-8, samples of test heteroduplex 
DNA (PAH) after incubation with 0, 250, 500, and 1000 units of T4 
endonuclease VII. Scheme below is as in Fig. 1. 



PAH 



PAH 



123456 M 78 91011 1213 



4-133 bp 




73 



5'«- 



PAH 



1 2 3 4 5 6 7 8 




G • 5* 

60 



•133 bp 




73 



5'« 



C -S' G «5' 

T 60 

FlG. 3. (A ) Autoradiograph of CCM and EMC analysis of the PAH 
gene (M13) containing a heterozygous C -> G mutation in exon 2. 
Lanes: 1-3, samples of control homodupiex DNA (C) after incubation 
with hydroxyiamine for 0, 1, and 1.5 hr; 4-6, samples of test hetero- 
duplex DNA (PAH) after incubation with hydroVylarnine for 0, 1, and 
1 .5 hr; 7-11, samples of C after incubation with 0, 1000, 2000, 2500, and 
3000 units of T4 endonuclease VII; 12 and 13, samples of PAH after 
incubation with 0 and 1000 units of T4 endonuclease VII. (B) 
Autoradiograph of postdigestion end-labeling (as described in the text) 
of the PAH gene (M13) containing a heterozygous C G mutation 
in exon 2. Lanes: 1-3, samples of control homodupiex DNA (C) after 
incubation with 0, 250, and 1000 units of T4 endonuclease VII; 4-8, 
samples of test heteroduplex DNA (PAH) after incubation with 0, 100, 
250, 500, and 1000 units of T4 endonuclease VII. Schemes below are 
as in Fig. 1.- 
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5^ 



A gene 



ECM 



^4 T~6 7_S_ 9J0J 1 12^ 13 14 



«-204bp 



^ ■ 




110 



its* 



A • 5' 

t A 94 



4. Autoradiograph of CCM and EMC analysis of the 21- 
Sroxylase A gene (M17) in a plasmid containing a homozygous T -> 
jnutation. Lanes: I and 2, samples of control homoduplex DNA (C) 
fter incubation with osmium tetroxide (OT) for 0 and 5 min; 3 and 
^Samples of C after incubation with 0 and 250 units of T4 endonu- 
fease VII; 5 and 6, samples of test heteroduplex DNA (A gene) after 
jbation with OT for 0 and 5 min; 7 and 8, samples of A gene after 
jbation with 0 and 250 units of T4 endonuclease VII; 9-11, samples 
fjfi fccnc after incubation with 500 units of T4 endonuclease VII for 
j/^/and 16 hr, 12-14, samples of A gene after incubation with 1000 
mils of T4 endonuclease VII for 1, 3. and 16 hr at 37°C. Scheme below 
>. Fig. 1. 

?f any mutations and attempt to change conditions appropri- 
ately to allow detection. Similarly, we would like to investigate 
[[he reasons for nonspecific cleavage present in some cases (see 
Table 1 and Fig. 3). 

JcOur results show that in about half the cases studied, 
detection was observed by the cleavage of one of the hetero- 
l&uplexes in the set. For example, the heteroduplex containing 
§1the A-G mismatch was cleaved in M2, but we relied on cleavage 
$>f the heteroduplex containing the reciprocal T-C mismatch 
| for Selection in Ml (Table 1). This gives further support to the 
^View that T4 endonuclease VII is dependent on sequence 
Fcontext as well as DNA structure (5, 17). It is also clear from 
||TabIe 1 that the mismatch pair generally considered to be the 
g^jiost thermostable (G-T) is recognized efficiently by T4 
^ndonuclease VII where 8 of 13 type 2 mismatches tested 
llg^iihowed cleavage of G-T-containing heteroduplex. In these 
^'^fe cases, detection depended either solely (in three cases) or in 
^■Conjunction with recognition of the complementary OA mis- 



match pair (in five cases). At the other end of the scale, the 
mismatch pair considered to be one of the least thermostable 
(DC) was recognized by T4 endonuclease VII in all four cases 
tested in this study. 

The postdigestion end-labeling method described here was 
developed to apply these findings to screen lengths of DNA in 
the most effective manner. Most experiments were performed 
with excess unlabeled target DNA over labeled probe DNA to 
form duplexes before cleavage. For simple and practical use, 
we propose forming duplexes between equimoiar mutant and 
wild-type DNA, cleaving and then kinase labeling all 5' OH 
ends before electrophoresis. This allows assay of each strand 
for cleavage without probe production, thus maximizing the 
chances of detecting mutations. When using this method, two 
bands were always observed resulting from the labeling of all 
the free 5' OH ends of the cleavage products. 
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[General ] [Name and orig in] [References ] [Comments ] [Cross-references] [ Keywords ] [Features] 

[Sequence] [Tools] 

Note: most headings are clickable, even if they don't appear as links. They link to the user manual or other documents . 

General information about the entry 



Entry name 

Primary accession number 
Secondary accession numbers 
Entered in Swiss-Prot in 
Sequence was last modified in 
Annotations were last modified in 
Name and origin of the protein 
Protein name 
Synonyms 

Gene name 

From 

Taxonomy 

References 



CSE1JHUMAN 
P55060 

075432 Q9H5B7 Q9NTS0 Q9UP98 Q9UP99 Q9UPA0 
Release 34, October 1996 
Release 41, February 2003 
Release 41, February 2003 

Importin-alpha re-exporter 
Chromosome segregation 1-like protein 
Cellular apoptosis susceptibility protein 
CSE1L or CAS 

Homo sapiens (Human) [TaxID: 9606 ] 

Eukaryota; Metazoa ; Chordata; Craniata ; Vertebrata : Euteleostomi : 
Mammalia ; Eutheria ; Primates : Catarrhini ; Hominidae : Homo . 



[1] SEQUENCE FROM NUCLEIC ACID (ISOFORM 1). 
TISSUE=P]acenta; 

MEDLINE=96036098; PubMed=7479798; [NCBL ExPASv . EBI, Israel , Japan] 
Brinkmann U. . Brinkmann E. , Gallo M. . Pastan I ; 

"Cloning and characterization of a cellular apoptosis susceptibility gene, the human homologue to 
the yeast chromosome segregation gene CSEL"; 
Proc. Natl. Acad. Sci. U.SA. 92:10427-10431(1995) . 
[2] SEQUENCE FROM NUCLEIC ACID (ISOFORMS 1; 2 AND 3). 
TISSUE=Brain; 

MEDLINE=99265971; PubMed=10331944; [ NCBL ExPASy . EBL Israel . Japan] 
Brinkmann U. . Brinkmann E. . Bera T.K. . Wellmann A. . Pastan I. ; 

"Tissue-specific alternative splicing of the CSE1L/CAS (cellular apoptosis susceptibility) gene."; 
Genomics 58:41-49(1999) . 

[3] SEQUENCE FROM NUCLEIC ACID. 

MEDLINE=21638749; PubMed=l 1780052; [ NCBL ExPAS y. EBL Israel . Japan] 

Deloukas P. . Matthews L.H. , Ashurst J. . Burton L . Gilbert J.G.R. . Jones M. . Stavrides G. . Almeida 

J.P. . Babbage A.K. . Bagguley C.L. . Bailey J. . Barlow K.F. . Bates K.N. . Beard L.M. . Beare D.M. . 
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Beaslev P.P. . Bird CP. . Blakey S.E. . Bridgeman A.M. . Brown A.J. . Buck P.. Burrill W.D. . Butler 
A.P. . Carder C . Carter N.P. . Chapman J.C . Clamp M. . Clark G. . Clark L.N. . Clark S.Y. . Clee CM. T 
Clegg S.. Cobley V.E. . Collier R.E. . Connor R.E. . Corby N.R. . Coulson A. . Coville G.J. . Deadman 
IL, Dhami P.D. . Dunn M. , Ellington A.G. , Frankland J. A. . Fraser A. , French L. , Garner P. , Grafham 
D.V. . Griffiths C . Griffiths M.N.D. . Gwilliam R. . Hall R.E. . Hammond S. . Harley J.L. . Heath P.P. . 
HoS. . Holden J.L. . Howden P. J. . HuckleE. . Hunt A.R. . Hunt S.E. . Jekosch K. . Johnson CM. . 
Johnson P. . Kav M.P. . Kimberlev A.M. . King A. . Knights A. . Laird G.K. . Lawlor S. . Lehvaeslaiho 
M.H. . Leversha M.A. . Lloyd C . Lloyd P.M. . Lovell J.P. . Marsh V.L. . Martin S.L. . McConnachie 
L.J. . McLay K„ McMurray A. A. . Milne S.A. . Mistry P. . Moore M.J.F. . Mullikin J.C . Nickerson T„ 
Oliver K. . Parker A. . Patel R. . Pearce T.A.V. . Peck A.I. . Phillimore B.J.C.T. . Prathalingam S.R. . 
Plumb R.W. . Ramsay H. . Rice CM. . Ross M.T. . Scott C.E. . Sehra H.K. . Shownkeen R. . Sims S. . 
Skuce CP. . Smith M.L. . Soderlund C . Steward C.A. . Sulston J.E. . Swann R.M. . Sycamore N. . 
Taylor R. . TeeL. . Thomas P.W. . Thorpe A. . Tracey A. . Tromans A.C . Vaudin M. . Wall M„ Wallis 
J.M. . Whitehead S.L. . Whittaker P. . Willey P.L. . Williams L. . Williams S.A. . Wilming L. . Wray 
P.W. . Hubbard T. . Purbin R.M. . Bentley P.R. . Beck S„ Rogers J. ; 
"The PNA sequence and comparative analysis of human chromosome 20."; 
Nature 414:865-871(2001). 
[4] FUNCTION. 

MEPLINE=97462907; PubMed=9323134; rNCBI . ExPASv . EBI . Israel . Japanl 
Kutay U. . Bischoff F.R. . Kostka S. . Kraft R. . Goriich P. : 

"Export of importin-alpha from the nucleus is mediated by a specific nuclear transport factor."; 
Cell 90:1061-1071(1997) . 

Comments 

• FUNCTION: Export receptor for importin alpha. Mediates importin-alpha reexport from the 
nucleus to the cytoplasm after import substrates have been released into the nucleoplasm. 

• SUBUNIT: Binds with high affinity to importin-alpha only in the presence of RanGTP. The 
complex is dissociated by the combined action of RanBPl and RanGAPl. 

• SUBCELLULAR LOCATION: Nuclear and cytoplasmic. 

• ALTERNATIVE PRODUCTS: 

o Alternative splicing [3 named forms] 

Name 1 

IsoformIPP55060-l 

This is the isofonn sequence displayed in this entry . 

Name 2 

Isoform IP P55060-2 

Features which should be applied to build the isoform sequence: VSP 001222 . 
VSP 001223 . 

Name 3 

Isoform IP P55060-3 

Features which should be applied to build the isoform sequence: VSP 001224 . 
VSP 001225 . 

• TISSUE SPECIFICITY: HIGHLY EXPRESSEP IN PROLIFERATING CELLS. 

• SIMILARITY: BELONGS TO THE CSE1 FAMILY. 

• SIMILARITY: Contains 1 importin N-terminal domain. 
Copyright 

This SWISS-PROT entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics 
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and the EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use by non-profit 
institutions as long as its content is in no way modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ or send an email to license@isb-sib.ch) 

Cross-references 



EMBL 



PIR 
Genew 
CleanEx 
MIM 

GeneCards 
GeneLynx 



GO 



SOURCE 
Ensembl 

InterPro 

Pfam 

PROSITE 

ProDom 

HOVERGEN 

BLOCKS 

ProtoNet 

ProtoMap 

PRESAGE 



[EMBL / GenBank / DDBJ1 [CoDingSequencel 
fEMBL / GenBank / DDBJ1 [CoDingSequence ] 
[EMBL / GenBank / DDBJ ] rCoDingSequence l 
fEMBL / GenBank / DDBj j [CoDingSequence ] 
fEMBL / GenBank / DDBJ 1 [ CoDingSequence l 
JOINED. fEMBL / GenBank / DDBJ] [CoDingSequencel 
JOINED. [EMBL / GenBank / DDBJ1 rCoDingSequence l 
JOINED. [EMBL / GenBank / DDBJ] [CoDingSequencel 
JOINED. [EMBL / GenBank / DDBJ] rCoDingSequencel 
JOINED. [EMBL / GenBank / DDBJ] rCoDingSequence l 
JOINED. fEMBL / GenBank / DDBJ] [CoDingSequence l 
JOINED. [EMBL / GenBank / DDBJ] rCoDingSequencel 
fEMBL / GenBank / DDBJ1 [CoDingSequencel 
[EMBL / GenBank / DDBJ ] [CoDingSequence l 
fEMBL / GenBank / DDBJ 1 rCoDingSequencel 
[EMBL / GenBank / DDBJ1 [CoDingSequence l 



U33286; AAC50367.1; -. 
AF053640; AAC35007.1 
AF053641;AAC35008.1 
AF053642; AAC35009.1 
AF053651;AAC35297.1 
AF053644; AAC35297.1 
AF053645; AAC35297.1 
AF053646; AAC35297.1 
AF053647; AAC35297.1 
AF053648; AAC35297.1 
AF053649; AAC35297.1 
AF053650; AAC35297.1 
AL121903;CAB86644.1 
AL121903;CAC33854.1 
AL133174;CAC14081.1 
AL133174; CAC14082.1 
139166; 139166. 
HGNC:2431 : CSE1L. 
HGNC:2431 : CSE1L. 
601342 [NCBI/EBI]. 
CSE1L . 

CSE1L ; Homo sapiens. 
GO:0005737; Cellular component 
statement). 

GO:0005634: Cellular component: nuclear chromosome (traceable author statement). 
GO:0008262; Molecular function: importin-alpha export receptor activity (traceable 

author statement). 
GO:0006915; Biological process: apoptosis (traceable author statement). 
GO:0008283: Biological process: cell proliferation (traceable author statement). 

CSE1L; Homo sapiens. 

P55060; Homo sapiens. [Entry / Contig view ] 

rPR005043 : CAS_CSE1. 

IPR001494 : Importinb_N. 

Graphical view of domain structure . 

PF03378 :CAS CSE1: 1. 

PF03810 :IBN NT: 1. 

PS50166; IMPORTIN_B_NT; 1. 

[Domain structure / List of seq. sharing at least 1 domain ] 

[Family / Alignment / Tree] 

P55060 . 

P55060 . 

P55060 . 

P55060. 



cytoplasmic chromosome (traceable author 
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Get region on 2D PAGE . 



DIP P55060 . 
ModBase P55060 . 

SWISS- 
2DPAGE 

Keywords 

Transp ort: Protein transp ort: Nuclear protein : Alternative splicin g. 
Features 



Feature table viewer 




Key 


From 


TO 


Length 


Description 




FTId 


DOMAIN 


29 


102 


74 


IMPORTIN N-TERMINAL. 






VARSPLIC 


190 


195 




ATIELC -> VWNASW (in isoform 2) . 




VSP„001222 


VARSPLIC 


196 


971 




Missing (in isoform 2) . 




VSP_001223 


VARSPLIC 


943 


945 




VPS -> TYF (in isoform 3). 




VSP_001224 


VARSPLIC 


946 


971 




Missing (in isoform 3) . 




VSP_001225 


CONFLICT 


231 


233 




WEG -> FED (IN REF. 2; AAC35297 


AND 3) . 




CONFLICT 


514 


514 




G -> E (IN REF. 2; AAC35297 AND 


D • 




CONFLICT 


848 


848 




K -> N (IN REF. 1) . 






CONFLICT 


934 


934 




K -> M (IN REF. 1) . 







Sequence information . 

Length: 971 Molecular weight: 110325 CRC64: 850F2F07B954E316 [This is a checksum on the 
AA Da sequence] 

10 20 30 - 40 50 60 

I I I III 

MELSDANLQT LTEYLKKTLD PDPAIRRPAE KFLESVEGNQ NYPLLLLTLL EKSQDNVIKV 

70 80 90 100 110 120 

I I I I I I 

CASVTFKNYI KRNWRIVEDE PNKICEADRV AIKANIVHLM LSSPEQIQKQ LSDAISIIGR 

130 140 150 160 170 180 

I I I II I 

EDFPQKWPDL LTEMVNRFQS GDFHVINGVL RTAHSLFKRY RHEFKSNELW TEIKLVLDAF 



190 



200 



210 



220 



230 



240 



I I I I I I 

ALPLTNLFKA TIELCSTHAN DASALRILFS SLILISKLFY SLNFQDLPEF WEGNMETWMN 



250 



260 
I 



270 



280 



290 



300 



NFHTLLTLDN KLLQTDDEEE AGLLELLKSQ ICDNAALYAQ KYDEEFQRYL PRFVTAIWNL 



310 



320 
I 



330 



340 



350 



360 



LVTTGQEVKY DLLVSNAIQF LASVCERPHY KNLFEDQNTL TSICEKVIVP NMEFRAADEE 

370 380 390 400 410 420 

I I I I II 

AFEDNSEEYI RRDLEGSDID TRRRAACDLV RGLCKFFEGP VTGIFSGYVN SMLQEYAKNP 



430 



440 



450 



460 



470 



480 
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I 

SVNWKHKDAA 
490 

KADGIKYIMI 

550 
I 

FTAAEIAPFV 



IYLVTSLASK 

500 
I 

FRNQVPKEHL 



510 520 

I I I 

LVSIPLLINH LQAGSIWHT YAAHALERLF 



EILLTNLFKA 

620 
I 

KLLAVSKNPS KPHFNHYMFE 

680 
I 

LETHKNDIPS 



610 
I 



670 



AQTQKHGITQ ANELVNLTEF FVNHILPDLK 

530 

LQAGSIWHT 

580 
I 

YIMKAIMRSF 

640 
I 

KANPAAWNF 



560 570 

I I 
LTLPGSSENE 



590 



630 
I 

AICLSIRITC 



SLLQEAIIPY 

650 
I 

EEALFLVFTE 



I 



PYVFQVMSLL 
730 

SAAADKI PGL 

790 

I I 

SKTTKFIKSF LVFINLYCIK 



690 
I 

SYMALFPHLL 



800 



850 860 

I I 
ICAVGITKLL TECPPMMDTE 



740 750 

I I 

LGVFQKLIAS KANDHQGFYL 

810 
I 

YGALALQEIF 

870 
I 

YTKLWTPLLQ 



QPVLWERTGN 

760 
I 

LNSIIEHMPP 

820 
I 

DGIQPKMFGM 

880 
I 

SLIGLFELPE 



700 710 

I I 
IPALVRLLQA 



770 



I 



SANVNEFPVL 

540 
I 

TMRGPNNATL 

600 
I 

IPTLITQLTQ 

660 
I 

ILQNDVQEFI 

720 
I 

FLERGSNTIA 

780 
I 

FILLFQRLQN 



91£> 



TAFSQLAFAG 



920 
I 

KKEHDPVGQM 



930 940 

I I 
VNNPKIHLAQ SLHKLSTACP 



ESVDQYRKQI 

830 
I 

VLEKIIIPEI 

890 
I 

DDTIPDEEHF 

950 
I 

GRVPSMVSTS LNAEALQYLQ 



840 
I 

QKVSGNVEKK 

900 
I 

IDIEDTPGYQ 

960 
I 



970 
I 

GYLQAASVTL L 



P55060 in FASTA 
format 



View entry in original Swiss-Prot format 

View entry in raw text format (no links) 

Report form for errors/updates in this Swiss-Prot entry 



BLAST submission on 
BLAST ExPASy/SIB 

orat NCBI (USA) 



ScanProsite . MotifScan 





0m 



Sequence analysis tools: ProtParam , ProtScale . 
Compute pI/Mw , Pe ptideMass , Pe ptideCutten 
Dotlet (Java) 

Search the SWISS-MODEL Repository 



| ^ ExPASy Home page 


Site Map 


Search ExPASv 


Contact us | Swiss-Prot | 




Hosted bv NCSC US 


Mirror sites: 


Bolivia China 


Switzerland 


Taiwan 





The Canadian and Korean ExPASy sites, ca.expasy.org and kr.expasy.org, are temporarily not available. 
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ID CSE1_HUMAN STANDARD; PRT; 971 AA. 

AC P55060; Q9UP99; 075432; Q9NTS0; Q9H5B7; Q9UP98; Q9UPA0; 

DT 01-OCT-1996 (Rel. 34, Created) 

DT 01-MAR-2002 (Rel. 41, Last sequence update) 

DT 01-MAR-2002 (Rel. 41, Last annotation update) 

DE IMPORTIN-ALPHA RE-EXPORTER (CHROMOSOME SEGREGATION 1-LIKE PROTEIN) 

DE (CELLULAR APOPTOSIS SUSCEPTIBILITY PROTEIN) . 

GN CSE1L OR CAS. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID-9606; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RC T I S S UE= PLACENTA ; 

RX MEDLINE= 96036098 ; PubMed=7479798 ; 

RA Brinkmann U., Brinkmann E., Gallo M. , Pastan I.; 

RT "Cloning and characterization of a cellular apoptosis susceptibility 

RT gene, the human homologue to the yeast chromosome segregation gene 

RT CSE1-"; 

RL Proc. Natl. Acad. Sci. U.S.A. 92:10427-10431(1995). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1; 2 AND 3). 

RC TISSUE=BRAIN; 

RX MEDLINE= 99265971 ; PubMed=10331944 ; 

RA Brinkmann U. , Brinkmann E., Bera T.K., Wellmann A., Pastan I.; 

RT "Tissue-specific alternative splicing of the CSE1L/CAS (cellular 

RT apoptosis susceptibility) gene."; 

RL Genomics 58:41-49(1999). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE= 21638749 ; PubMed= 11780052 ; 

RA Deloukas P., Matthews L.H., Ashurst J., Burton J., Gilbert J.G.R., 

RA Jones M., Stavrides G., Almeida J. P., Babbage A.K., Bagguley C.L., 

RA Bailey J. , Barlow K.F., Bates K.N., Beard L.M., Beare D.M., 

RA Beasley O.P., Bird CP., Blakey S.E., Bridgeman A.M., Brown A. J., 

RA Buck D., Burrill W., Butler A. P., Carder C, Carter N.P., 

RA Chapman J.C., Clamp M. , Clark G., Clark L.N., Clark S.Y., Clee CM., 

RA Clegg S., Cobley V.E., Collier R.E., Connor R., Corby N.R., 

RA Coulson A., Coville G.J., Deadman R., Dhami P., Dunn M . , 

RA Ellington A.G., Frankland J. A., Fraser A., French L., Garner P., 

RA Grafham D.V., Griffiths C, Griffiths M.N.D., Gwilliam R. , Hall R.E. 

RA Hammond S., Harley J.L., Heath P.D., Ho S., Holden J.L., Howden P.J. 

RA Huckle E., Hunt A.R., Hunt S.E., Jekosch K. , Johnson CM., Johnson D 

RA Kay M.P., Kimberley A.M., King A., Knights A., Laird G.K., Lawlor S. 

RA Lehvaslaiho M.H., Leversha M. , Lloyd C, Lloyd D.M., Lovell J.D., 

RA Marsh V.L., Martin S.L., McConnachie L.J., McLay K . , McMurray A. A., 

RA Milne S., Mistry D., Moore M.J.F., Mullikin J.C, Nickerson T., 

RA Oliver K., Parker A., Patel R., Pearce T.A.V., Peck A.I., 

RA Phillimore B.J.C.T., Prathalingam S.R., Plumb R.W., Ramsay H . , 

RA Rice CM., Ross M.T., Scott C.E., Sehra H.K., Shownkeen R., Sims S., 

RA Skuce CD., Smith M.L., Soderlund C, Steward C.A., Sulston J.E., 

RA Swann M., Sycamore N/, Taylor R. , Tee L., Thomas D.W., Thorpe A., 

RA Tracey A., Tromans A.C, Vaudin M., Wall M., Wallis J.M., 

RA Whitehead S.L., Whittaker P., Willey D.L., Williams L., Williams S.A 

RA Wilming L., Wray P.W., Hubbard T., Durbin R.M., Bentley D.R., Beck S 

RA Rogers J.; 

RT "The DNA sequence and comparative analysis of human chromosome 20."; 

RL Nature 414:865-871(2001). 

RN [4] 
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RP FUNCTION. 

RX MEDLINE= 974 62907 ; PubMed= 9323134 ; 

RA Kutay U., Bischoff F.R., Kostka S., Kraft R., Gorlich D. ; 

RT "Export of importin-alpha from the nucleus is mediated by a specific 

RT nuclear transport factor."; 

RL Cell 90:1061-1071(1997). 

CC -!- FUNCTION: EXPORT RECEPTOR FOR IMPORTIN ALPHA. MEDIATES IMPORTIN- 
CC ALPHA REEXPORT FROM THE NUCLEUS TO THE CYTOPLASM AFTER IMPORT 

CC SUBSTRATES HAVE BEEN RELEASED INTO THE NUCLEOPLASM. 

CC -!- SUBUNIT: BINDS WITH HIGH AFFINITY TO IMPORTIN-ALPHA ONLY IN THE 
CC . PRESENCE OF RANGTP. THE COMPLEX IS DISSOCIATED BY THE COMBINED 

CC ACTION OF RANBP1 AND RANGAP1. 

CC -!- SUBCELLULAR LOCATION: NUCLEAR AND CYTOPLASMIC. 

CC -!- ALTERNATIVE PRODUCTS: 3 ISOFORMS; 1 (SHOWN HERE), 2 AND 3; ARE 
CC PRODUCED BY ALTERNATIVE SPLICING. 

CC -!- TISSUE SPECIFICITY: HIGHLY EXPRESSED IN PROLIFERATING CELLS. 

CC -!- SIMILARITY: BELONGS TO THE CSE1 FAMILY. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by^ non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U33286 ; AAC50367 . 1; 

DR EMBL; AF053640 ; AAC35007 . 1; -. 

DR EMBL; AF053641 ; AAC35008 .1; -. 

DR EMBL; AF053642 ; AAC35009 .1; -. 

DR EMBL; AF053651 ; AAC35297 .1; 

DR EMBL; AF053644 ; AAC352 97 .1; JOINED. 

DR EMBL; AF053645 ; AAC352 97 .1; JOINED. 

DR EMBL; AF05364 6 ; AAC35297 .1; JOINED. 

DR EMBL; AF053647 ; AAC35297 . 1; JOINED. 

DR EMBL; AF053648 ; AAC35297 .1; JOINED. 

DR EMBL; AF053649 ; AAC35297.1; JOINED. 

DR EMBL; AF053650 ; AAC352 97.1; JOINED. 

DR EMBL; AL121903 ; CAB866 44 .1; -. 

DR EMBL; AL121903 ; CAC33854 .1; -. 

DR EMBL; AL133174 ; CAC14081 .1; 

DR EMBL; AL133174 ; CAC14082.1; 

DR MIM; 601342 ; -. 

DR InterPro; IPRQ01494 ; IBN_NT. 

DR Pfam; PF03378 ; CAS_CSE1; 3. 

KW Transport ; Protein transport ; Nuclear protein ; Alternative splicing . 

FT VARSPLIC 190 195 ATIELC -> VWNASW (IN ISOFORM 2). 

FT VARSPLIC 196 971 MISSING (IN ISOFORM 2). 

FT VARSPLIC 943 945 VPS -> TYF (IN ISOFORM 3). 

FT VARSPLIC 946 971 MISSING (IN ISOFORM 3). 

FT CONFLICT 231 233 WEG -> FED (IN REF. 2; AAC35297 AND REF. 

FT 3 ) . 

FT CONFLICT 514 514 G -> E (IN REF. 2; AAC35297 AND REF. 3) . 

FT CONFLICT 848 848 K -> N (IN REF. 1) . 

FT CONFLICT 934 934 K -> M (IN REF. 1) . 

SQ SEQUENCE 971 AA; 110325 MW; 850F2F07B954E316 CRC64 ; 

MELSDANLQT LTEYLKKTLD PDPAIRRPAE KFLESVEGNQ NYPLLLLTLL EKSQDNVIKV 
CASVTFKNYI KRNWRIVEDE PNKICEADRV AIKANIVHLM LSSPEQIQKQ LSDAISIIGR 
EDFPQKWPDL LTEMVNRFQS GDFHVINGVL RTAHSLFKRY RHEFKSNELW TEIKLVLDAF 
ALPLTNLFKA TIELCSTHAN DASALRILFS SLILISKLFY SLNFQDLPEF WEGNMETWMN 
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NFHTLLTLDN KLLQTDDEEE AGLLELLKSQ 
LVTTGQEVKY DLLVSNAIQF LASVCERPHY 
AFEDNSEEYI RRDLEGSDID TRRRAACDLV 
SVNWKHKDAA IYLVTSLASK AQTQKHGITQ 
KADGIKYIMI FRNQVPKEHL LVSIPLLINH 
FTAAEIAPFV EILLTNLFKA LTLPGSSENE 
KLLAVSKNPS KPHFNHYMFE AICLSIRITC 
PYVFQVMSLL LETHKNDIPS SYMALFPHLL 
SAAADKIPGL LGVFQKLIAS KANDHQGFYL 
SKTTKFIKSF LVFINLYCIK YGALALQEIF 
ICAVGITKLL TECPPMMDTE YTKLWTPLLQ 
TAFSQLAFAG KKEHDPVGQM VNNPKIHLAQ 
GYLQAASVTL L 



ICDNAALYAQ KYDEEFQRYL PRFVTAIWNL 
KNLFEDQNTL TSICEKVIVP NMEFRAADEE 
RGLCKFFEGP VTGIFSGYVN SMLQEYAKNP 
ANELVNLTEF FVNHILPDLK SANVNEFPVL 
LQAGSIVVHT YAAHALERLF TMRGPNNATL 
YIMKAIMRSF SLLQEAIIPY IPTLITQLTQ 
KANPAAVVNF EEALFLVFTE ILQNDVQEFI 
QPVLWERTGN I PALVRLLQA FLERGSNTIA 
LNSIIEHMPP ESVDQYRKQI FILLFQRLQN 
DGIQPKMFGM VLEKIIIPEI QKVSGNVEKK 
SLIGLFELPE DDTIPDEEHF IDIEDTPGYQ 
SLHKLSTACP GRVPSMVSTS LNAEALQYLQ 
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